Great script and advices indeed.
Now the other problem, running this 'cleaning' script takes now an
estimated 40 minutes on my proxy
machine (60Mb of logs, result of 2 very active days, normally a week).
Should I perhaps rotate the squidlogs
then? And run this script daily in a crontab on the freshly rotated log
only? I think this would be a solution,
any other ideas?

Thanks,
Endre.





Kirk Schneider

theon.com> cc: squid-users@squid-cache.org
Subject: Re: [squid-users] Calamaris
03/01/2004
07:11 PM






Endre,

I have contacted the Calamaris author before on this and he has
suggested filtering the extra fields that smartfilter adds at
the end.

Now I run this on all my logs before piping to calamaris:

awk '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' access.log |calamaris


--
Kirk Schneider 972-952-4645 (work)
Raytheon Corporate IT Security 214-912-8679 (cell)
kschneider@raytheon.com 888-431-7621 (pager)

"If you think the problem is bad now just wait until we've solved it."



-------- Original Message --------
Subject: [squid-users] Calamaris
Date: Mon, 1 Mar 2004 17:43:52 +0100
From: Endre Szekely-Bencedi
To: squid-users@squid-cache.org

Hello List,

I have a problem with Calamaris (v2.58).

I am using squid 2.5stable3, compiled from sources, with SmartFilter
plugin.
As far as I know, I have to use the squid-extended input type for this. But
this will give some errors:

[root@localhost logs]# date;cat test.log | /usr/local/squid/bin/calamaris
-f squid-extended -F html > /var/www/html/calamaris2.html;date
Mon Mar 1 17:44:08 CET 2004
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Split loop at (eval 1) line 20, <> line 369578.
Mon Mar 1 17:48:05 CET 2004
[root@localhost logs]#

Generated log shows:






Which is an empty page.

A sample from the logfile:

1077780471.441 93 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780471.466 64 3.227.65.74 TCP_MISS/200 1722 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.479 72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780471.508 59 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780471.699 73 3.227.65.74 TCP_MISS/200 1585 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.713 83 3.227.65.74 TCP_MISS/200 1607 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.726 86 3.227.65.74 TCP_MISS/200 1589 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.885 256 3.227.65.74 TCP_MISS/200 726 GET
http://as.fotexnet.hu/adserver.ads/153/0///937480 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.212 229 3.227.65.74 TCP_MISS/200 23713 GET
http://index.hu/ad/lipton/banner1_120x240.swf? -
DEFAULT_PARENT/10.20.20.254 applicat
ion/x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.298 72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780473.388 279 3.227.65.74 TCP_MISS/200 17697 GET
http://index.hu/ad/microsoft_wss.swf? - DEFAULT_PARENT/10.20.20.254
application/x-sho
ckwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.439 106 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780473.458 47 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780473.480 368 3.227.65.74 TCP_MISS/200 4292 GET
http://as.fotexnet.hu/adserver.ads/196/0///27236 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.643 162 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780473.646 144 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
Sites
1077780473.673 487 3.227.65.74 TCP_MISS/200 10319 GET
http://as.fotexnet.hu/adserver.ads/200/0///378158 -
DEFAULT_PARENT/10.20.20.254 text/
html text/html ALLOW
1077780473.799 280 3.227.65.74 TCP_MISS/200 26216 GET
http://index.hu/ad/teluzoallo_120x240.swf? - DEFAULT_PARENT/10.20.20.254
application/
x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.819 122 3.227.65.74 TCP_MISS/200 216 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.824 124 3.227.65.74 TCP_MISS/200 355 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.842 136 3.227.65.74 TCP_MISS/200 1603 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780473.846 47 3.227.65.74 TCP_MISS/200 353 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites

Am I doing something wrong?

Thanks,
Endre.




"THIS E-MAIL MESSAGE ALONG WITH ANY ATTACHMENTS IS INTENDED ONLY FOR THE
ADDRESSEE and may contain confidential and privileged information. If the
reader of this message is not the intended recipient, you are notified that
any dissemination, distribution or copy of this communication is strictly
prohibited. If you have received this message by error, please notify us
immediately, return the original mail to the sender and delete the message
from your system."