Tuesday, September 8, 2009

Re: [BLUG] tracking downloads of a single file

Let me prelude this by saying that I normally use webalizer, awstats
and even some web analysis software I wrote myself, but num-utils is
also another method I use when I want some quick flexibility and for
general command line use.

One of the reasons why I wrote the num-utils set of programs was for
doing special analytics like this when other programs weren't easy to
use for the same thing. Particularly, I found it crazy that the
standard set of unix tools didn't have a program for just summing up
numbers easily and for "greping numerically" instead of lexically.

http://suso.suso.org/xulu/Num-utils

Its also available in some distributions, like Debian, Ubuntu and
Gentoo. I haven't maintained it in a while though so if you find a bug,
let me know.

Here are some examples of how you could use the programs in real world
situations. All these use Apache common log format.

1) A simple example to start with is where you want to know what the
total number of bytes downloaded from your website. The bytes
column is the 10th column.

cat access_log | awk {'print $10'} | numsum


2) Now let's say you want to only add up the total bytes consumed by
people downloading large files, like over 1MB

cat access_log | awk {'print $10'} | numgrep /1048576../ | numsum


3) Now the same thing, but only files that are mp3 files.

cat access_log | grep "\.mp3 HTTP/1\." | awk {'print $10'} | numgrep /1048576../ | numsum

The HTTP/1\. is part of the expression just to help ensure that only
the right lines are matched.

4) You can also use for loops to do things like display request traffic by day.

for day in `numrange /01..31/` ; do echo -n "$day: " ; grep " \[$day/Aug/2009:" access_log.2009.08 \
| wc -l ; done

Or the same, but for byte traffic per day.

for day in `numrange /01..31/` ; do echo -n "$day: " ; grep " \[$day/Aug/2009:" access_log.2009.08 \
| awk {'print $10'} | numsum ; done

You can also use the seq program to generate the range, but numrange
has the added benefit that it auto zeropads numbers for you, which
would be important for the regex used to match the date used in the logs.

Note: I wrote all these out without testing them so I may have goofed on
one of the examples.


Some of these things can be done with the stats software that is out
there, but this method will give you more flexibility if you need to
do something unique. If you could email me what exactly you are trying
to find or what problem you are facing that requires this information, I
may be able to explain how to get the results.


On Tue, Sep 08, 2009 at 02:46:04PM GMT, Ben Shewmaker [ben@shewbox.org] said the following:
> Two part question really:
>
> One, is there an easy way to track downloads of individual files on my
> website? I have a few files in particular I would like stats on, and I
> don't seem to have a way to do that at the moment. (I'm a bit clueless on
> this subject actually)
>
> Second, what stats program(s) do others on the list use? I like Google
> Analytics, but I'm curious if there are other notable programs that run
> locally vs. a tracking service in the cloud. Any good open source ones?
> I've been using Analog and Google so far. . .
>
> Ben

> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug


--
Mark Krenz
Bloomington Linux Users Group
http://www.bloomingtonlinux.org/
_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

No comments: