Logmonster - Http log file splitter, processor, sorter, etc
Matt Simerson (email@example.com)
Creates a new Apache::Logmonster object. All methods in this module are OO. See t/Logmonster.t for a working examples.
Checks to see if $local/etc/awstats is set up for awstats config files. If not, it creates it. Make sure to drop a custom copy of your sites awstats.model.conf file in there. Finally, it makes sure the $domain it was passed has an awstats file configured for it. If not, it creates one.
Perform sanity tests on the system. It will complain quite loudly if it finds things are not workable.
Compresses a file. Does a test first to make sure the file exists and then compresses it using gzip. Pass it a hostname and a file and it compresses the file on the remote host. Uses SSH to make the connection so you will need to have key based authentication set up.
Collects compressed log files from a list of servers into a working directory for processing.
feed_the_machine takes the sorted vhost logs and feeds them into the chosen stats processor.
extracts a list of hosts from logmonster.conf, and then downloads log files form each to the staging area.
Determines where to fetch an intervals worth of logs from. Based upon the -i setting (hour,day,month), this sub figures out where to find the requested log files that need to be processed.
report_hits reads a days log results file and reports the results to standard out. The logfile contains key/value pairs like so:
matt.simerson:4054 www.tnpi.net:15381 www.nictool.com:895
This file is read by logmonster when called in -r (report) mode and is expected to be called via a monitoring agent (nrpe, snmpd, BB, etc.).
Appends information about referrer spam to the logmonster -v report. An example of that report can be seen here: http://www.tnpi.net/wiki/Referral_Spam
By now we have collected the Apache logs from each web server and split them up based on vhost. Most stats processors require the logs to be sorted in cronological order. So, we open up each vhosts logs for the day, read them into a hash, sort them based on their log entry date, and then write them back out.
After collecting the log files from each server in the cluster, we need to split them up based upon the vhost they were intended for. This sub does that.
Report to author.
Support for individual webalizer.conf file for each domain
Delete log files older than X days/month
Do something with error logs (other than just compress)
If files to process are larger than 10MB, find a nicer way to sort them rather than reading them all into a hash. Now I create two hashes, one with data and one with dates. I sort the date hash, and using those sorted hash keys, output the data hash to a sorted file. This is necessary as wusage and http-analyze require logs to be fed in chronological order. Take a look at awstats logresolvemerge as a possibility.
Copyright (c) 2003-2012, The Network People, Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the The Network People, Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DIS CLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.