[9fans] General Question: 9fans.mbox archive and problem solvingw/computers - Plan9
This is a discussion on [9fans] General Question: 9fans.mbox archive and problem solvingw/computers - Plan9 ; Just thinking about this probably fairly simple task, but it
seems a bit overwhelming. Suppose you want a searchable
archive of 9fans and all you have to start with is this single
150 MB file of ~46,000 messages. What do ...
-
[9fans] General Question: 9fans.mbox archive and problem solvingw/computers
Just thinking about this probably fairly simple task, but it
seems a bit overwhelming. Suppose you want a searchable
archive of 9fans and all you have to start with is this single
150 MB file of ~46,000 messages. What do you do? The
answer isn't obvious to me.
Greg
P.S. I'm prepared to rely on existing searchable archives to
do this in real life. It's just that in the few minutes thought I've
given to this problem, I've become much more impressed
with those existing seachable archives and their very
quick responses.
-
Re: [9fans] General Question: 9fans.mbox archive and problem solvingw/computers
most machines these days have 10x that much memory. it should
be speedy enough to use strstr(2) once you've loaded them into
memory. and even loading them into memory should take no
more than a few seconds at 80MB/s.
a more elegant solution would be to reduce each document to
a set of stemmed words, enumerate the set of all stems in all
documents and create a bit array mapping stems to message #.
but that seems like too much work for only 150MB.
- erik