Glenn Wood > LoadWorm-1.02 > LoadWorm

Download:
LoadWorm-1.02.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.   Source  

NAME ^

LoadWorm - WebSite Stress and Validation Tool

DESCRIPTION ^

The LoadWorm is a tool to load a website with requests, and to record the resultant performance, from a web client's perspective. It can also be used for various investigative purposes, such as validation of the website, or discovering all the referrers to a page, etc.

It consists of two main parts -

The LoadWorm's operation is controlled by a configuration file. The LoadMaster/Slave reads the same configuration file for some of it's configurables (proxy, verbosity, etc), but is controlled mainly through a Tk based GUI.

The LoadWorm and LoadMaster/Slave works on Windows NT and Unix (tested on Solaris and Linux), or any combination of these systems.

WEBSITE TRAVERSAL ^

The LoadWorm takes one or more URLs as input (specified in its configuration file, loadworm.cfg).

WEBSITE LOADING ^

Website loading is performed by the LoadMaster program. The LoadMaster runs on a master computer. One or more LoadSlaves may run on the same computer, or different computers on the same network. The operator can control all LoadSlaves from the LoadMaster. He can start them, pause them, and tune the loading rate (e.g., total hits per second).

Which URL's are actually loaded by the LoadSlaves is specified in a file named visits.txt. This is simply a list of fully specified URL's, with CGI parameters, such as the one generated by the LoadWorm. (The PUT method is not yet implemented here).

The LoadMaster also reads some parameters from the same configuration file that serves the LoadWorm. It conveys these parameters to all the LoadSlaves, as well as transmitting to them the visits list.

Each LoadSlave can be configured with a simple rewrite mechanism to replace specified parameters in each URL with a value received from a previous response. Thus, if the website supports it's session state via a CGI parameter, each slave can log itself in as a seperate session. This simple mechanism can be enhanced by working over the Perl code.

Since it does not need to do any special calculations for laying down the route, the LoadSlave can perform its operations quickly, utilizing less memory, than the LoadWorm. This makes it possible to run several slaves on the same host computer. Each LoadSlave must be started manually on each of the several hosts. This simplifies the security situation, as the LoadMaster does not need to directly control anyone else's computer. Give each LoadSlave the IP address of the LoadMaster when you start each LoadSlave. You can start the LoadMaster first, or all the LoadSlaves first, or in any combination.

Thus, on the master host computer, use the command:

and on each slave computer, use the command:

The IP_ADDR and port number of the LoadMaster is displayed on the LoadMaster GUI when you start it up. The default port number of the LoadMaster is 9676 ("WORM" on a phone pad), but it's possible to come up differently, especially if you're running two LoadMasters on the same computer.

If the LoadMaster crashes, or is turned off, the LoadSlaves will wait patiently for it to come back up, and each will reconnect when it does. To finish a test, you can terminate all the LoadSlaves from the LoadMaster GUI, then terminate the LoadMaster. The owners of the host computers you've borrowed for the load test might want to terminate the test on their computer. They can do that by closing the LoadSlave on their computer, with no ill effect on your test except for the lost data and load.

THE CONFIGURATION FILE ^

The process of the LoadWorm is controlled by its configuration file. This file is named loadworm.cfg, and is found in the current working directory. It is structured like a profile.ini file, with [section] specifying seperate sections, and with parameters and attribute=value pairs within each section. The sections include:

[Mode]

Various modes are set here; depth, timeouts, printing, error management, etc. See "[Mode]".

[Traverse]

URLs listed here are the anchor(s) of the target website. See "[Traverse]".

[Ignore]

URLs listed here will be ignored in the traversal. See "[Ignore]".

[Input]

The user may specify values to be tried as input to each INPUT field in each FORM. See "[Input]".

[Limit]

To prevent infinite recursion, each page is visited a limited number of times (see "Recurse" in "[Mode]"). In the section you can specify different limits for different pages. See "[Limit]".

[ReferersReport]

The webpages that link to the URLs listed here will be recorded as such in a "links" database. See "[ReferersReport]".

[Validation]

User customizable routines to validate the data that is returned for each URL requested. See "[Validation]".

[Proxy]

A URL specifying the location of the proxy for web access, if any. See "[Proxy]".

[NoProxy]

Domain names for which the proxy is not to be used. See "[NoProxy]".

[Credentials]

Authentication credentials for different net locations and realms. See "[Credentials]".

[Mode]

Depth = n

The loadworm will go to a maximum of 'n' links down from the anchor URL. Depth=1 would load only the anchor page, and none of its links.

Random = {0,1}

If non-zero, then links will be traversed in random order, rather than in the order that they appear in the visits file. A value of 1 will traverse all links in random order.

Recurse = n

Each URL will be traversed only once, unless the Recurse value is more than one. Then each URL will be traversed the number of times specified by Recurse.

Timeout = secs

Specifies the timeout period for all links (in seconds). If a link does not download completely within the time specified by this value, then it is considered a timeout error. Default = 120 seconds.

NoImages = {0,1}

If non-zero, ignores all image links.

Verbose = {0,1}

Controls the verbosity of standard output as the loadworm processes. Use 0 for the greatest degree of quiet. Reports on the actual performance of the loadworm are created from a database the loadworm creates.

Harvest = {0,1}

Turns off/on the option to harvest the results from the loadslaves. Turning it off improves managability, since the slaves then do not need to maintain a record of the results. This also reduces disk thrashing when multiple loadslaves are running on a host. Harvest=0 is useful if you are monitoring the load on the server's side.

[Traverse]

Specifies the URL(s) that are the anchor(s) of this test. These are the URL(s) that are the anchor(s) of the website to be tested by this loadworm execution.

[Ignore]

A list of regular expressions which, if matching a generated URL, will cause that URL to be ignored. For instance, .*\.netscape\.com would prevent the loadworm from traversing any link to the websites of Netscape. Note that if the URL is explicitly listed in the [Traverse] section, then any [Ignore] match will, in its turn, be ignored.

[ReferersReport]

A list of regular expressions which, when they match a generated URL, will record in a database all webpages that link to that URL.

[Validation]

Each link can be validated with a custom Perl subroutine. The subroutine is selected by matching the URL to a regex. The subroutine is given the URL and the resultant webpage. The validation routine can then verify the accuracy of the response, and can write to the loadworm database files to record successes and/or errors. Particularly, the checks table is reserved for this. It is tied to the hash %main::Checks, which is conventionally a hash whose keys are the URLs, and whose values are whatever string the validation routine wishes to report about this URL/response pair. A zero returned from the validation routine will tell the loadworm to ignore all links within this page. A non-zero return will allow normal processing to continue. For example:

This will match any URL, and will call your subroutine, "Check", in your package "AnyURL.pm". AnyURL.pm must be in the @INC path, and must include a package statement (e.g. package AnyURL). See the example, AnyURL.pm. for details.

[Proxy]

A URL specifying the location of the proxy for web access, if any.

[NoProxy]

Domain names for which the proxy is not to be used.

[Credentials]

Specifies a list of user ids and passwords for each of the realms that may require authentication. The "net location" and "realm" are seperated by a slash, then "user id" and "password" are seperated by a comma. "Netlocation/realm" and "userid,password" are then associated with an equals sign, as in:

"webdev.savemart.com" is the net location, "Test Server" is the realm, "MyID" is the user id, and "twi9y" is the password.

[Input]

Each line specifies a list of values to be iterated across whenever a URL and INPUT line name match the specified regular expression. The list is specified as a Perl statement suitable for eval. This feature will later allow more elaborate input generation, but for now it allows the specification of a list of values via qw(list). For example:

The URL is matched to the first regex (before the comma), then the NAME of the INPUT field is matched to the second regex (following the comma). Then the list of values specified by the perl statement (following the equals sign) is iterated on the matched URL. The special syntax of NULL is provided to allow the field to have a null value.

[Limit]

Each line specifies a regex that will match a URL, and the number of times that that URL should be visited in a LoadWorm traversal. Thus,

The owa/categories.get CGI script will be called only 50 times in the traversal, owa/favorite.get only 10, owa/cart.get only 5, etc. The count is for all URLs that match these regular expressions. Thus, it doesn't matter what the CGI parameters might be to these CGI scripts, the scripts themselves will be called only as many times as the [Limit] section specifies.

Example of a Configuration File

        [Mode]
        Harvest=1
        Depth=10
        Random=1
        Recurse=
        Timeout=30
        Verbose=0
        NoImages=0
        UserAgent=Mozilla/4.01 [en] (WinNT; I)
        Editor="C:\Program Files\TextPad\TxtPad32.exe"
        
        [Traverse]
        http://webdev.savesmart.com
        
        [Credentials]
        webdev.savesmart.com/Test Server=MyID,twi9y
        
        [Ignore]
        www\.
        www6\.
        maps\.
        justgo\.com
        netscape\.com
        /owa/go_home\.get.*
        
        [Limit]
        owa/categories\.get=50
        owa/favorite\.get=10
        owa/cart\.get=5
        owa/specials\.get=10
        owa/search\.get=10
        
        [ReferersReport]
        \.savesmart\.com:900\/
        favorite\.get
        
        [Validation]
        
        [Proxy]
        http://ssgw.savesmart.com
        
        [NoProxy]
        admin
        webdev.savesmart.com
        
        [Input]
        login.get,name=qw(test1)
        login.get,cardnumber=qw(test1234)
        login.get,email=NULL

THE RESULTS DATABASE ^

NOTE: This information is not current, but it gives you the general idea of what is possible once we tie up a few loose ends.

The results of a session of LoadWorm are recorded in a Perl accessible database. Although some information is printed to standard output as the session progresses, the most interesting results should be discovered by scanning the LoadWorm database for that session. The database consists of several hash-tied tables. Each table is keyed by the URL associated with it, and the value will be a string representing the result. For some of these tables, the result is an array of strings representing several interactions with that URL. Unfortunately, Perl's built-in Tie::Hash will not record arrays in a tied table. For these tables, the data is converted to ASCII text data and written to a sequential file. The Perl code listed below can be used to pull this sequential file back into a hashed array in your Perl report generator.

referers

This relates URLs of the website to the parent pages that contain them. @referers{$childURL} is an array of URLs of pages that link to $childURL. (This table does not include images. These are recorded in the images table. It does include all ignored URLs.) Note: this file is not a hash-tied database file, but a sequential file containing data that can be imported into a hashed table with the Perl code listed below (tbl2hash.pl).

errors

This is a list of all the URLs that failed to download. $errors{$URL} is the error message associated with the attempt to download $URL.

ignores

This is a list of all URLs that were encountered in the website, but were ignored because they match some regular expression in the [Ignore] section of the configuration file. $ignore{$URL} is the regular expression that caused $URL to be included in this list.

timings

This text file records the time of each request, and the time of completion of that request. Each record consists of two (or more) lines. The first line contains the URL. The second line contains the start time, the finish time, and the size in a string like (hh:mm:ss.hh,hh:mm:ss.hh size). The size might be the string "FAILED", instead, indicating that the request failed. Then, the following lines will contain the reason for the failure, until a line containing a copy of the original "FAILED" line. Thus, timings includes the time for failed downloads as well as successful ones.

checks

This table is written by the user-customized validation routine(s).

   tbl2hash.pl

        %Linkages = ();
        open TBL, "<linkages";
        while ( <TBL> )  {
                if ( $_ !~ /^\s/ )  {
                        $ky = $_;
                }
                else {
                        s/^\s*//;
                        push @{  $Linkages{$ky}  }, $_;
                }
        }

NOTES ^

PREREQUESITES ^

These are the versions of Perl modules under which LoadWorm is known to work. It may be just fine with earlier or later versions.

AUTHOR ^

Glenn Wood, glenwood@alumni.caltech.edu.

Copyright 1997-1998 SaveSmart, Inc.

Released under the Perl Artistic License.

$Id: LoadWorm.pm,v 1.1.1.1 2001/05/19 02:54:40 Glenn Wood Exp $

syntax highlighting: