Mass-check not scanning all messages. - SpamAssassin

This is a discussion on Mass-check not scanning all messages. - SpamAssassin ; I have a custom spam corpus that I am trying to run rules against to test their effectiveness however mass-check will only scan a few ( of the spam and usually only 1 or 2 of the ham messages. Any ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: Mass-check not scanning all messages.

  1. Mass-check not scanning all messages.


    I have a custom spam corpus that I am trying to run rules against to test
    their effectiveness however mass-check will only scan a few ( < 5 ) messages
    of the spam and usually only 1 or 2 of the ham messages. Any clues? Roughly
    a week of googling and I can't find anyone with this exact problem.

    Thanks
    --
    View this message in context: http://www.nabble.com/Mass-check-not...p18916106.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  2. Re: Mass-check not scanning all messages.

    On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
    > I have a custom spam corpus that I am trying to run rules against to test
    > their effectiveness however mass-check will only scan a few ( < 5 ) messages
    > of the spam and usually only 1 or 2 of the ham messages. Any clues? Roughly
    > a week of googling and I can't find anyone with this exact problem.


    Can you be more specific about what you're doing / how your corpus
    is setup / etc? You've essentially said "things don't work, what's
    wrong".

    Some random thoughts: do you have mbox files but are not specifying them
    as such? are the majority of messages > 250k?

    --
    Randomly Selected Tagline:
    "Besides, I think [Slackware] sounds better than 'Microsoft,' don't you?"
    - Patrick Volkerding

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.8 (GNU/Linux)

    iD8DBQFIn0prRnAwoQckjjoRAl0fAJ92xPRIEcJ/QvyniJDdRtjnYssBgwCguyKx
    a47TVzgp+jGaA+uYYCSVJYQ=
    =qJfI
    -----END PGP SIGNATURE-----


  3. Re: Mass-check not scanning all messages.


    I made a small bash wrapper script so I could set it up to scan a few
    different corpora but this is what it is executing. I specified the --all
    switch so large messages should not be an issue.

    In the two respective corpus directories (ham | spam) emails are just
    dumped in there. I looked at the spam.log and the ham.log and then looked at
    the corrosponding messages thinking it was something special about those
    messages but everything looks normal about them. :/

    $WORKINGDIR/mass-check --progress --all --showdots \
    ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
    spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam

    $WORKINGDIR/hit-frequencies -x -p -a > freqs
    egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs


    Theo Van Dinter-2 wrote:
    >
    > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
    >> I have a custom spam corpus that I am trying to run rules against to test
    >> their effectiveness however mass-check will only scan a few ( < 5 )
    >> messages
    >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
    >> Roughly
    >> a week of googling and I can't find anyone with this exact problem.

    >
    > Can you be more specific about what you're doing / how your corpus
    > is setup / etc? You've essentially said "things don't work, what's
    > wrong".
    >
    > Some random thoughts: do you have mbox files but are not specifying them
    > as such? are the majority of messages > 250k?
    >
    > --
    > Randomly Selected Tagline:
    > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't you?"
    > - Patrick Volkerding
    >
    >
    >


    --
    View this message in context: http://www.nabble.com/Mass-check-not...p18916593.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  4. Re: Mass-check not scanning all messages.


    if ham and spam are directories containing mboxes, you might be better
    off with this:

    $WORKINGDIR/mass-check --progress --all --showdots \
    ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
    spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*

    I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".

    --j.

    RN-Chris writes:
    >
    > I made a small bash wrapper script so I could set it up to scan a few
    > different corpora but this is what it is executing. I specified the --all
    > switch so large messages should not be an issue.
    >
    > In the two respective corpus directories (ham | spam) emails are just
    > dumped in there. I looked at the spam.log and the ham.log and then looked at
    > the corrosponding messages thinking it was something special about those
    > messages but everything looks normal about them. :/
    >
    > $WORKINGDIR/mass-check --progress --all --showdots \
    > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
    > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
    >
    > $WORKINGDIR/hit-frequencies -x -p -a > freqs
    > egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
    >
    >
    > Theo Van Dinter-2 wrote:
    > >
    > > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
    > >> I have a custom spam corpus that I am trying to run rules against to test
    > >> their effectiveness however mass-check will only scan a few ( < 5 )
    > >> messages
    > >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
    > >> Roughly
    > >> a week of googling and I can't find anyone with this exact problem.

    > >
    > > Can you be more specific about what you're doing / how your corpus
    > > is setup / etc? You've essentially said "things don't work, what's
    > > wrong".
    > >
    > > Some random thoughts: do you have mbox files but are not specifying them
    > > as such? are the majority of messages > 250k?
    > >
    > > --
    > > Randomly Selected Tagline:
    > > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't you?"
    > > - Patrick Volkerding
    > >
    > >
    > >

    >
    > --
    > View this message in context: http://www.nabble.com/Mass-check-not...p18916593.html
    > Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



  5. RE: Mass-check not scanning all messages.

    I added /* to the end of the dir paths but that didn't change anything. The
    mails do have a somewhat weird naming convention. They were used on imap so
    a sample filename would be something like this.

    1214839027.5368_1.servername:2,

    However it was a problem with it scanning that type of filename I would
    think it wouldn't still scan 4 messages. I would think it would scan zero
    messages if it was a naming problem with the messages.

    -- Chris

    -----Original Message-----
    From: jm@jmason.org [mailto:jm@jmason.org]
    Sent: Sunday, August 10, 2008 3:56 PM
    To: RN-Chris
    Cc: users@spamassassin.apache.org
    Subject: Re: Mass-check not scanning all messages.


    if ham and spam are directories containing mboxes, you might be better
    off with this:

    $WORKINGDIR/mass-check --progress --all --showdots \
    ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
    spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*

    I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".

    --j.

    RN-Chris writes:
    >
    > I made a small bash wrapper script so I could set it up to scan a few
    > different corpora but this is what it is executing. I specified the --all
    > switch so large messages should not be an issue.
    >
    > In the two respective corpus directories (ham | spam) emails are just
    > dumped in there. I looked at the spam.log and the ham.log and then looked

    at
    > the corrosponding messages thinking it was something special about those
    > messages but everything looks normal about them. :/
    >
    > $WORKINGDIR/mass-check --progress --all --showdots \
    > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
    > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
    >
    > $WORKINGDIR/hit-frequencies -x -p -a > freqs
    > egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
    >
    >
    > Theo Van Dinter-2 wrote:
    > >
    > > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
    > >> I have a custom spam corpus that I am trying to run rules against to

    test
    > >> their effectiveness however mass-check will only scan a few ( < 5 )
    > >> messages
    > >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
    > >> Roughly
    > >> a week of googling and I can't find anyone with this exact problem.

    > >
    > > Can you be more specific about what you're doing / how your corpus
    > > is setup / etc? You've essentially said "things don't work, what's
    > > wrong".
    > >
    > > Some random thoughts: do you have mbox files but are not specifying them
    > > as such? are the majority of messages > 250k?
    > >
    > > --
    > > Randomly Selected Tagline:
    > > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't

    you?"
    > > - Patrick Volkerding
    > >
    > >
    > >

    >
    > --
    > View this message in context:

    http://www.nabble.com/Mass-check-not...18916106p18916
    593.html
    > Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



  6. Re: Mass-check not scanning all messages.

    On 10/08/2008 4:11 PM, RN-Chris wrote:
    > In the two respective corpus directories (ham | spam) emails are just
    > dumped in there.


    > $WORKINGDIR/mass-check --progress --all --showdots \
    > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
    > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam


    dir not mbox


  7. Re: Mass-check not scanning all messages.


    hmm. that sounds like it's not using mboxes. are you sure you are?
    If not, just use "ham:detect:" or "ham:dir:".

    --j.

    Chris Reed writes:
    > I added /* to the end of the dir paths but that didn't change anything. The
    > mails do have a somewhat weird naming convention. They were used on imap so
    > a sample filename would be something like this.
    >
    > 1214839027.5368_1.servername:2,
    >
    > However it was a problem with it scanning that type of filename I would
    > think it wouldn't still scan 4 messages. I would think it would scan zero
    > messages if it was a naming problem with the messages.
    >
    > -- Chris
    >
    > -----Original Message-----
    > From: jm@jmason.org [mailto:jm@jmason.org]
    > Sent: Sunday, August 10, 2008 3:56 PM
    > To: RN-Chris
    > Cc: users@spamassassin.apache.org
    > Subject: Re: Mass-check not scanning all messages.
    >
    >
    > if ham and spam are directories containing mboxes, you might be better
    > off with this:
    >
    > $WORKINGDIR/mass-check --progress --all --showdots \
    > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
    > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*
    >
    > I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".
    >
    > --j.
    >
    > RN-Chris writes:
    > >
    > > I made a small bash wrapper script so I could set it up to scan a few
    > > different corpora but this is what it is executing. I specified the --all
    > > switch so large messages should not be an issue.
    > >
    > > In the two respective corpus directories (ham | spam) emails are just
    > > dumped in there. I looked at the spam.log and the ham.log and then looked

    > at
    > > the corrosponding messages thinking it was something special about those
    > > messages but everything looks normal about them. :/
    > >
    > > $WORKINGDIR/mass-check --progress --all --showdots \
    > > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
    > > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
    > >
    > > $WORKINGDIR/hit-frequencies -x -p -a > freqs
    > > egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
    > >
    > >
    > > Theo Van Dinter-2 wrote:
    > > >
    > > > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
    > > >> I have a custom spam corpus that I am trying to run rules against to

    > test
    > > >> their effectiveness however mass-check will only scan a few ( < 5 )
    > > >> messages
    > > >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
    > > >> Roughly
    > > >> a week of googling and I can't find anyone with this exact problem.
    > > >
    > > > Can you be more specific about what you're doing / how your corpus
    > > > is setup / etc? You've essentially said "things don't work, what's
    > > > wrong".
    > > >
    > > > Some random thoughts: do you have mbox files but are not specifying them
    > > > as such? are the majority of messages > 250k?
    > > >
    > > > --
    > > > Randomly Selected Tagline:
    > > > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't

    > you?"
    > > > - Patrick Volkerding
    > > >
    > > >
    > > >

    > >
    > > --
    > > View this message in context:

    > http://www.nabble.com/Mass-check-not...18916106p18916
    > 593.html
    > > Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



  8. RE: Mass-check not scanning all messages.

    J,

    Really appreciate it sir. I wasn't aware that the mbox was a switch. Changed
    it to spam:dir and ham:dir and then removed the trailing /* and it worked
    (at least appeared to) work wonderfully.

    Thanks to all that helped.

    -- Chris


    -----Original Message-----
    From: jm@jmason.org [mailto:jm@jmason.org]
    Sent: Monday, August 11, 2008 6:29 AM
    To: chris@revogate.com
    Cc: jm@jmason.org; users@spamassassin.apache.org
    Subject: Re: Mass-check not scanning all messages.


    hmm. that sounds like it's not using mboxes. are you sure you are?
    If not, just use "ham:detect:" or "ham:dir:".

    --j.

    Chris Reed writes:
    > I added /* to the end of the dir paths but that didn't change anything.

    The
    > mails do have a somewhat weird naming convention. They were used on imap

    so
    > a sample filename would be something like this.
    >
    > 1214839027.5368_1.servername:2,
    >
    > However it was a problem with it scanning that type of filename I would
    > think it wouldn't still scan 4 messages. I would think it would scan zero
    > messages if it was a naming problem with the messages.
    >
    > -- Chris
    >
    > -----Original Message-----
    > From: jm@jmason.org [mailto:jm@jmason.org]
    > Sent: Sunday, August 10, 2008 3:56 PM
    > To: RN-Chris
    > Cc: users@spamassassin.apache.org
    > Subject: Re: Mass-check not scanning all messages.
    >
    >
    > if ham and spam are directories containing mboxes, you might be better
    > off with this:
    >
    > $WORKINGDIR/mass-check --progress --all --showdots \
    > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
    > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*
    >
    > I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".
    >
    > --j.
    >
    > RN-Chris writes:
    > >
    > > I made a small bash wrapper script so I could set it up to scan a few
    > > different corpora but this is what it is executing. I specified the

    --all
    > > switch so large messages should not be an issue.
    > >
    > > In the two respective corpus directories (ham | spam) emails are just
    > > dumped in there. I looked at the spam.log and the ham.log and then

    looked
    > at
    > > the corrosponding messages thinking it was something special about those
    > > messages but everything looks normal about them. :/
    > >
    > > $WORKINGDIR/mass-check --progress --all --showdots \
    > > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
    > > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
    > >
    > > $WORKINGDIR/hit-frequencies -x -p -a > freqs
    > > egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
    > >
    > >
    > > Theo Van Dinter-2 wrote:
    > > >
    > > > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
    > > >> I have a custom spam corpus that I am trying to run rules against to

    > test
    > > >> their effectiveness however mass-check will only scan a few ( < 5 )
    > > >> messages
    > > >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
    > > >> Roughly
    > > >> a week of googling and I can't find anyone with this exact problem.
    > > >
    > > > Can you be more specific about what you're doing / how your corpus
    > > > is setup / etc? You've essentially said "things don't work, what's
    > > > wrong".
    > > >
    > > > Some random thoughts: do you have mbox files but are not specifying

    them
    > > > as such? are the majority of messages > 250k?
    > > >
    > > > --
    > > > Randomly Selected Tagline:
    > > > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't

    > you?"
    > > > - Patrick Volkerding
    > > >
    > > >
    > > >

    > >
    > > --
    > > View this message in context:

    >

    http://www.nabble.com/Mass-check-not...18916106p18916
    > 593.html
    > > Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



+ Reply to Thread