[9fans] General Question: 9fans.mbox archive and problem solvingw/computers - Plan9

This is a discussion on [9fans] General Question: 9fans.mbox archive and problem solvingw/computers - Plan9 ; Just thinking about this probably fairly simple task, but it seems a bit overwhelming. Suppose you want a searchable archive of 9fans and all you have to start with is this single 150 MB file of ~46,000 messages. What do ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: [9fans] General Question: 9fans.mbox archive and problem solvingw/computers

  1. [9fans] General Question: 9fans.mbox archive and problem solvingw/computers

    Just thinking about this probably fairly simple task, but it
    seems a bit overwhelming. Suppose you want a searchable
    archive of 9fans and all you have to start with is this single
    150 MB file of ~46,000 messages. What do you do? The
    answer isn't obvious to me.

    Greg

    P.S. I'm prepared to rely on existing searchable archives to
    do this in real life. It's just that in the few minutes thought I've
    given to this problem, I've become much more impressed
    with those existing seachable archives and their very
    quick responses.

  2. Re: [9fans] General Question: 9fans.mbox archive and problem solvingw/computers

    most machines these days have 10x that much memory. it should
    be speedy enough to use strstr(2) once you've loaded them into
    memory. and even loading them into memory should take no
    more than a few seconds at 80MB/s.

    a more elegant solution would be to reduce each document to
    a set of stemmed words, enumerate the set of all stems in all
    documents and create a bit array mapping stems to message #.
    but that seems like too much work for only 150MB.

    - erik

+ Reply to Thread