Apache::Logmonster - Log Processing Utility
Author: Matt Simerson.
Added Params::Validate dependency to Makefile.PL
Fixed bug where time offset was being ignored.
Checks more locations for awstats.pl (needs to become a config file settings).
Removed an unnecessary dependency on Mail::Toaster in Utility.pm.
added OUTPUT_AUTOFLUSH b/c printing to *STDERR flushes immediately and mixed print statements did not. Email report is better formatted now.
debug was not being set properly, report formatting tweaks
added _progress sub
This new version is mostly about code quality and maintainability (not new features). The large chunks of code have been modularized into smaller subroutines and tests have been written to test the functionality of each sub. There are now 102 different tests (was 23) in the test suite. Added t/Test-coverage.pl, t/Logmonster.t, t/pod.t, t/pod-coverage, t/00.load
Nearly all the "working" code has been moved into lib/Apache/Logmonster.pm. Logmonster.pl is now a "shell" consisting of a little bit of code and a lot of documentation.
All the functions are now Object Oriented. Time will tell if that is a A Good Thing[TM] but it makes reading the code and understanding where all the calls are going much, much easier. added doc/*
The documentation has been significantly updated, addressing many of the common questions and comments I have received.
The reporting has been overhauled. You still get the same information but by default, if everything is okay it runs entirely silently. A single -v will output status messages that make for a nice birds eye view of your web log traffic. You can add additional -v options for even more verbose reporting.
Interface change: instead of -m for month, -d for day, and -h for hour, you use a -i [hour|day|month] option. The old -mdh options are officially deprecated but will continue to work for the indefinte future.
synced lib/Apache/Logmonster/Perl & Utility with Mail::Toaster 5 versions.
new feature submitted by Gernot Hueber.
new feature funded by Lewis Bergman: statsdir can be automatically created if it does not exist. Set statsdir_policy = create in logmonster.conf to enable this feature.
New feature by Lewis: Instead of having a full awstats.conf file for each vhost, each vhost has its own file that contains only specific info for that vhost and includes the generic /etc/awstats/awstats.conf file with the "global" settings.
statsdir can now be an absolute path (ex: /var/www/html). If so, processor output will be stored in statsdir/vhost. Otherwise, it works as it used to and output goes to vhost/statsdir. This is useful if you have a seperate machine (not the web server) that does the processing and that system does not have access to the vhost docroot.
* you can now select which stats processor is used for each virtual host. Create a .processor file in the stats dir and place the name of the processor on the first line.
* test to make sure the log file exists before trying to compress it (supresses spurious warnings)
* If vhosts are all in a directory, skip any files that end with ~ (vim) or .bak (user).
* the code that hashed the domain list collected the settings from httpd.conf and stuffed them into a hash with the name of the vhost. This worked great, as long as ServerName was the first declaration in your httpd.conf. I discovered this fails otherwise. Now it stuffs the settings into a hash and then, after all the data is collected, moves the data into a new hash keyed off the vhost servername. The function now works regardless of order in the vhost container.
* If a perl module was missing, the script would fail after attempting to load Mail::Toaster::Perl (which may not exist) Added Apache::Logmonster::Perl to distro
package is now named Apache::Logmonster to fit nicely into a CPAN category bundled up for CPAN & freshmeat release Makefile.PL updated package NAME removed MATT::* dependency, added Compress::Zlib dep logmonster.pl updated package name added more example settings to logmonster.conf cleaned up pod docs for prettier web page formatting added TODO file remove MATT::Bundle reference from FAQ
updated lib/Logmonster/Utility to latest fixed get_the_date bug in Utility added many more tests for Utility fixed a bug in my fileparse call (File::Basename)
Removed MATT::Utility dependency added lib/logmonster/Utility (logmonster::Utility) removed Exporter updates for use with logmonster::Utility replaced StripLastDirFromPath with File::Basename
Raymond Dijkxhoorn suggested not sorting the files if there is only one host. Shucks, that's a reasonable enough thing to do, for those of you with only one web server. ;-) Logmonster will now dutifully skip sorting logs if only one hostname is configured in logmonster.conf
allow for ServerName to have a :80 style suffix added verbose (-v) flag. fixed up reporting so quiet mode is really quiet unless there are errors normal output is prettier debugging output is much prettier apache config file parsing is now much more versatile if you have a folder full of files for vhosts, you can have multiple vhosts within a file now misc internal changes for efficiency prototyped all subroutines added additional comments here and there added test for FileHandle
Added inline documentation to a few of the subs Modified SortVhostLogs so that it uses much less memory by writing to the log file as we sort instead of building an array and then writing the array contents to the file in GetDomainList, I forgot to add domains without aliases to %domains GetServerName: made the regexp search more reliable (no known problems but the potential existed).
Switched date parsing regexp from / / to /\s+/ in SortVhostLogs per Earl Ruby (firstname.lastname@example.org) for compatibility with Apache::Registry
Added regexp notes Added additional debugging Check for files locally before trying to compress them
Updated regexp to support numeric vhosts Updated logmonster.pl pod documentation Updated Makefile.PL, added README, FAQ files Added BSD copyright
Corrected a typo in the logmonster.pl config file.
Fixed problem where paths with caps weren't detected (search string was lower cased)
Updated documentation and web site. More informative.
Added a strip leading spaces function to GetVhostsFromFile Adjusted so FindTheBin will find awstats in its default location (/usr/local/www/cgi-bin/awstats.pl)
Fixed a problem with HitsPerVHost not getting written Fixed a mis-feature where running logmonster -r was clobbering the active log processing dir. Oops. Fixed a couple problems related to interaction between script and MATT::Utility Made quite a few failures more graceful. This later proved to be only beneficial from a theoretical standpoint as it didn't solve the problem I was tracking down.
Updated FetchLogFiles so that you don't need to ssh to localhost Updated URLs from matt.simerson.net to tnpi.biz Moved configuration from script to logmonster.conf Added Changes to the CVS depot Added support for AWStats log processor Added lots of documentation to logmonster.conf
Added pod documentation. Cleaned up SysCmd calls and now SysCmd calls use MATT::Utility
Moved many subs out of script into modules
Report hit counts (-r designed to be used with SNMP and RRDutil)
Moved code out of FetchLogFiles to GetTheLogDir for reuse Writes counters to $logdir/HitsPerVhost.txt Writes activity log to $logdir/Logmonster.txt Renamed $logdir to $logbase
Moved FAQ and Changelog to web site. Created web page for logmonster. Added support for http-analyze. It should work but I haven't used http-analzye in about 5 years so I might be forgetting something.
Added the httpd.conf parsing stuff. Now works with a vhost directory or parses out of your httpd.conf - cool :)
Moved vhost log pre-processer checks out of FeedTheMachine into CheckStatsDir. Run it before SortVhostLogs so we skip sorting any logs that we aren't going to store. Count up invalid lines in log files and report them instead of dying when we encounter them. Print out prettier logs.
An entire re-write Logic is much cleaner now and way more efficient Pulls most settings out of apache config files Reporting is much better :) Added -b (process archived logs) feature Added -h & -m (hourly & monthly processing) Added -n (dry run) so you can preview what it'll do Now clean enough that I'll make it publicly available
Major code cleanups Reworked fetch_the_files Added system_command Added FindTheBin
Added use strict (forced code cleanup) Added getopt::std instead of custom parser Moved $quiet to $opt_q, $debug to $opt_v
Made it work with matt.simerson.net
Cleaned up the code, added debug flag, expanded logic so v1.2 can replicate 1.0 & 1.1 behaviour with options
Added support for multiple domains