Per virtual host Apache hit stats in Cacti

Submitted by Falken on

You'll need to know your possible host names : httpd -t -D DUMP_VHOSTS | egrep 'default server|namevhost'

        default server baz.boo.com (/etc/httpd/conf.d/web.com.conf:19) 
        port 443 namevhost foo.boo.com (/etc/httpd/conf.d/web.com.conf:19)

Then add a new data input method in Cacti, using this Perl script, e.g. /usr/share/cacti/scripts/webhits2.pl 

Set the command to perl <path_cacti>/scripts/webhits2.pl <file>

Make sure the user Cacti runs as can execute this file (chmod a+rx might do).

Add an input param for "file", and output params for each of your virtual hosts.

#!/bin/env perl 
# https://stackoverflow.com/questions/20649387/extract-last-10-minutes-from-logfile 
use strict; 
use Date::Parse; 
my $ref;                
my @row; 
my %hosthits; 
open IN,"<".$ARGV[0];    # Open (READ) file submited as 1st argument 
seek IN,-1000,2;         # Jump to NNNN character before end of logfile. (This 
                         # could not suffice if log file hold very log lines!  
while (<IN>) {           # Until end of logfile... 
   $ref=str2time($1) if /\[(.{26})\]\s/; 
};                       # store time into $ref variable. 
seek IN,0,0;             # Jump back to the begin of file 
while (<IN>) { 
       if( /\[(.{26})\]\s/&&str2time($1)>$ref-300 ){ 
               @row=split(" "); 
               $hosthits{$row[0]}=$hosthits{$row[0]}+1; 
       } 

while (my ($k,$v)=each %hosthits){print "$k:$v "} 
print "\n";

Finally use this this new data input method to create a new data source. Note that you can't use full stops in the data source (RRA DS name) so use just the first part of the domain, or replace with under score or something.

Finally configure Apache itself to write an access log in the correct format. The location is what you gave as the "file" input param.

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" ncsa 
CustomLog logs/access_log ncsa 

You can change the log format, as long as you change regular expression in the two if that finds the date and time, and the index in $row (space separated items in each log line) finds the host name.