On 6/23/2008 4:36 PM, John GALLET wrote:
> Hi,
>
> First of all, thanks to Justin for patiently helping me to install
> mass-check and pointing me in the right direction. I will try to run the
> algorithms tonight to see what they come up with.
>
> In the meantime, you can find a hit-frequencies report at:
> http://www.saphirtech.fr/spam/freqs_2008_06_23.txt
>
> All rules are prefixed with FR_ and are available in the same directory.
>
> I must say I did not double check for stray spam in my mailbox before
> using it as a ham corpus but it *should* be clean. I'll double check for
> next run. The spam corpus was 100% French spam, hand-picked over the
> last week through the "probably-spam" class (default score values 5-15).
>
> Any feedback on the results (not enough in corpus, bad rules, good
> rules, etc.) appreciated.


I excluded the last two rules from my masscheck to avoid FPs as these
ESPs/X-Mailers are definitely grey, "import rcpt list and blast" sort of
ESPs not black for global use.


#counts FR_SPAMISLEGAL 8s/2h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_SPAMISLEGAL_2 5s/2h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_NOTSPAM 0s/0h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_PAYLESSTAXES 0s/0h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_REALESTATE_INVEST 0s/0h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_ONLINEGAMBLING 0s/0h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_ONLINEMEDS 0s/0h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_REASON_SUBSCRIBE 1s/1h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08
#counts FR_HOWTOUNSUBSCRIBE 7s/16h of 3859 corpus (1166s/2693h
AXB-MC1) 06/23/08

If these are hit rates with a very minimal daily corpus, don't know if
the present ruleset is ready for production unless you have 0 tolerance
for any bulk, period