> Technicallly it should be possible, but you need to write another
> retreiver spider for the engine knowing how to read the squid cache files
> instead of fetching from the web or indexing local files.
>
> The format of the cache files are described in the programmers guide and
> iirc there is even a perl module in CPAN for reading these files.


That was my next question; i.e. how do I read the cache?
Do you by any chance know the name of the CPAN module?

I looked at CPAN and found the Cache-2.01 module, is this the one?

> The developer list for the preferred search engine is a better place to
> ask I think. There is no modifications required to Squid but the search
> engine needs to be slightly modified to know how to read the Squid cache
> data.
>
> Each file in the cache contains
>
> a) Meta data like the URL of the file, size, time cached etc. Of this the
> search engine needs to use the URL as "name" of the indexed object.
>
> b) The object HTTP headers.
>
> c) The object contents. This is what needs to be indexed.
>
> b+c is the HTTP reply as received by Squid.


When I do a 'file' on a particular cache file, I get back that it is
DBase 3 format, is this correct, or is this just the closest that Linux
can get on determining the type of file? The question really is, how do
I put the cached file back into it's original format, with it's original
title for presentation to the server?

I looked at the 'purge' utility written by Jens-S. Vöckler since it can
decipher the squid cache, but I don't understand how it is working.

For example, I have a cache file:

/usr/local/squid/var/cache/00/09/0000092D

with header information:

^Co
Content-Length: 2173
Content-Type: image/gif
Last-Modified: Sun, 11 Jan 2004 05:20:46 GMT
Accept-Ranges: bytes
ETag: "5db8d2aa2d8c31:627d33"
Server: Microsoft-IIS/6.0
Date: Thu, 22 Jan 2004 03:02:01 GMT
Connection: close


and from that, the 'purge' utility returns the URL of:

http://www.whitehouse.org/kids/images/tn-palm.gif

How is the URL deciphered? For the life of me, I can't figure it out.

I read in the Programming Guide that "A cache swap file consists of two
parts: the cache metadata, and the object data."

Could you please point me to the code in squid that will show me how to
get at and decipher the metadata?

I am sorry t be such a bother, but I get totally lost in the squid code,
so pointer to the correct modules to look in will be very much
apprectaited.

Thanks,
Murrah Boswell