This is a discussion on Re: [squid-users] Squid and Search Engines - squid ; > Technicallly it should be possible, but you need to write another > retreiver spider for the engine knowing how to read the squid cache files > instead of fetching from the web or indexing local files. > > The ...
> Technicallly it should be possible, but you need to write another
> retreiver spider for the engine knowing how to read the squid cache files
> instead of fetching from the web or indexing local files.
> The format of the cache files are described in the programmers guide and
> iirc there is even a perl module in CPAN for reading these files.
That was my next question; i.e. how do I read the cache?
Do you by any chance know the name of the CPAN module?
I looked at CPAN and found the Cache-2.01 module, is this the one?
> The developer list for the preferred search engine is a better place to
> ask I think. There is no modifications required to Squid but the search
> engine needs to be slightly modified to know how to read the Squid cache
> Each file in the cache contains
> a) Meta data like the URL of the file, size, time cached etc. Of this the
> search engine needs to use the URL as "name" of the indexed object.
> b) The object HTTP headers.
> c) The object contents. This is what needs to be indexed.
> b+c is the HTTP reply as received by Squid.
When I do a 'file' on a particular cache file, I get back that it is
DBase 3 format, is this correct, or is this just the closest that Linux
can get on determining the type of file? The question really is, how do
I put the cached file back into it's original format, with it's original
title for presentation to the server?
I looked at the 'purge' utility written by Jens-S. Vöckler since it can
decipher the squid cache, but I don't understand how it is working.
For example, I have a cache file:
with header information:
Last-Modified: Sun, 11 Jan 2004 05:20:46 GMT
Date: Thu, 22 Jan 2004 03:02:01 GMT
and from that, the 'purge' utility returns the URL of:
How is the URL deciphered? For the life of me, I can't figure it out.
I read in the Programming Guide that "A cache swap file consists of two
parts: the cache metadata, and the object data."
Could you please point me to the code in squid that will show me how to
get at and decipher the metadata?
I am sorry t be such a bother, but I get totally lost in the squid code,
so pointer to the correct modules to look in will be very much