Junk Mail Controls stopped working, mostly - Mozilla

This is a discussion on Junk Mail Controls stopped working, mostly - Mozilla ; Moz Champion (Dan) wrote: > Penthor-Mul wrote: >> Moz Champion (Dan) wrote: >>> Snagglepuss wrote: >>>> Brian Heinrich wrote: >>>>> On 2006-12-30 13:26 (-0700 UTC), Snagglepuss wrote: >>>>> >>>>>> Penthor-Mul wrote: >>>>>>> I get a lot of spam, 50-75/day. >>>>>>> ...

+ Reply to Thread
Page 3 of 6 FirstFirst 1 2 3 4 5 ... LastLast
Results 41 to 60 of 106

Thread: Junk Mail Controls stopped working, mostly

  1. Re: Junk Mail Controls stopped working, mostly

    Moz Champion (Dan) wrote:
    > Penthor-Mul wrote:
    >> Moz Champion (Dan) wrote:
    >>> Snagglepuss wrote:
    >>>> Brian Heinrich wrote:
    >>>>> On 2006-12-30 13:26 (-0700 UTC), Snagglepuss wrote:
    >>>>>
    >>>>>> Penthor-Mul wrote:
    >>>>>>> I get a lot of spam, 50-75/day.
    >>>>>>>
    >>>>>>> Penthor
    >>>>>>
    >>>>>> is that all. I get around 300 a day, and thats from one account.
    >>>>>> And I'm sure someone will say they get more.
    >>>>>
    >>>>> A friend of mine gets about 1 200 e-mail a day, about 400 of which
    >>>>> make it to his inbox . . . and 30-40% of that amount are legitimate.
    >>>>>
    >>>>> /b.
    >>>>>
    >>>> well, there you go. I said someone would reply, didn't I :-)
    >>>>
    >>>
    >>>
    >>> I used to get 300 per day on my accounts, and that was eight years ago.
    >>> So I did something about it... I fought back. I have much less now,
    >>> and spam has increased 20 or 30 times since then.
    >>>
    >>> but still 904K after one day is still excessive. Even with 300 per
    >>> day that means 3KB for each message! 3000Kilobytes for each message!
    >>> Heck some spam isnt that big! And since much spam is duplicated have
    >>> 904KB after 1 day (and 300 spam) is really excessive.

    >>
    >> You fought back? How so Dan? Pipe bombs
    >> Seriously, how?
    >>
    >> Penthor

    >
    > Fighting back
    >
    > Get spam
    > Go to spam site (to see if its still there!)
    > do a traceroute to the spam site
    > report the site to its host for spamming
    >
    > Do the same for any 'image' site used in the spam
    > and for any 'sign-off' site as well.
    >
    > I use Visual Route for tracing, but it can be done with NeoTrace or just
    > the normal utilities on any computer (trace route/dig etc)
    >
    > Its simple, its easy, and effective. Can reduce spam intake up to 99%
    > (mine was reduced from 200/300 per day (and this in 1997 - it would be
    > over 2000 today) to less than 10 per week.
    >
    > Most ISPs will take action within the week, some large ones take two
    > weeks (AT&T and China.com for example). Spammer cant sell anything with
    > no web site, hurts them where it counts.
    >

    The easier way is to register with spamcop.net and install the okopipi
    extension.

    https://addons.mozilla.org/thunderbird/2672/

    Phil

  2. Re: Junk Mail Controls stopped working, mostly

    Phil Randal wrote:
    > Moz Champion (Dan) wrote:
    >> Penthor-Mul wrote:
    >>> Moz Champion (Dan) wrote:
    >>>> Snagglepuss wrote:
    >>>>> Brian Heinrich wrote:
    >>>>>> On 2006-12-30 13:26 (-0700 UTC), Snagglepuss wrote:
    >>>>>>
    >>>>>>> Penthor-Mul wrote:
    >>>>>>>> I get a lot of spam, 50-75/day.
    >>>>>>>>
    >>>>>>>> Penthor
    >>>>>>>
    >>>>>>> is that all. I get around 300 a day, and thats from one
    >>>>>>> account. And I'm sure someone will say they get more.
    >>>>>>
    >>>>>> A friend of mine gets about 1 200 e-mail a day, about 400 of which
    >>>>>> make it to his inbox . . . and 30-40% of that amount are legitimate.
    >>>>>>
    >>>>>> /b.
    >>>>>>
    >>>>> well, there you go. I said someone would reply, didn't I :-)
    >>>>>
    >>>>
    >>>>
    >>>> I used to get 300 per day on my accounts, and that was eight years ago.
    >>>> So I did something about it... I fought back. I have much less now,
    >>>> and spam has increased 20 or 30 times since then.
    >>>>
    >>>> but still 904K after one day is still excessive. Even with 300 per
    >>>> day that means 3KB for each message! 3000Kilobytes for each message!
    >>>> Heck some spam isnt that big! And since much spam is duplicated have
    >>>> 904KB after 1 day (and 300 spam) is really excessive.
    >>>
    >>> You fought back? How so Dan? Pipe bombs
    >>> Seriously, how?
    >>>
    >>> Penthor

    >>
    >> Fighting back
    >>
    >> Get spam
    >> Go to spam site (to see if its still there!)
    >> do a traceroute to the spam site
    >> report the site to its host for spamming
    >>
    >> Do the same for any 'image' site used in the spam
    >> and for any 'sign-off' site as well.
    >>
    >> I use Visual Route for tracing, but it can be done with NeoTrace or
    >> just the normal utilities on any computer (trace route/dig etc)
    >>
    >> Its simple, its easy, and effective. Can reduce spam intake up to 99%
    >> (mine was reduced from 200/300 per day (and this in 1997 - it would be
    >> over 2000 today) to less than 10 per week.
    >>
    >> Most ISPs will take action within the week, some large ones take two
    >> weeks (AT&T and China.com for example). Spammer cant sell anything
    >> with no web site, hurts them where it counts.
    >>

    > The easier way is to register with spamcop.net and install the okopipi
    > extension.
    >
    > https://addons.mozilla.org/thunderbird/2672/
    >
    > Phil



    Easier, yep. As effective? Not that I have seen.

    Many ISPs I deal with dont accept automated Junk/Spam reports, they junk
    them on sight, so they dont act on them.

    All the sites I report get closed - in about two weeks maximum. Do all
    the sites you report using that tool get closed?



    Did your spam get reduced on the order of 99% when you started using it?

  3. Re: Junk Mail Controls stopped working, mostly

    On 2007-01-02 07:28 (-0700 UTC), Moz Champion (Dan) wrote:

    > Brian Heinrich wrote:




    >> 'Pends on how many unique strings there are. Hence why I recommend
    >> the Bayes Junk Tool to prune training.dat and keep things functioning
    >> smoothly. . . .

    >
    > Um, why?
    > Why use another program/tool unless you need it?


    Because that's how the Bayesian filters work. For the filters to work
    properly, they require ham tokens as well as spam tokens.

    Obviously, the whole Bayesian craze started with Paul Graham's article, 'A
    Plan for Spam (). If you read
    ,
    you'll get a better idea of the importance of marking ham as such --
    especially given the now near-normative use of Bayesian poisoning to try to
    degrade the efficiency and efficacy of Bayesian filters.

    According to what you've written elsewhere, you started with a new JMC six
    months ago, and your advice to those whose JMC are showing signs of degrated
    performance is to delete training.dat and start from scratch.

    But by that point, the user will already have a significant database of
    tokens -- by your method, largely spam; by my method, a more balanced ration
    of spam to ham.

    Wouldn't it just be easier to improve the performance of JMC by clearing out
    the cruft rather than starting from scratch and having to retrain?

    Sure, you'll still have to mark (uncaught) spam as spam and -- at least if
    you want JMC to function properly -- mark ham as ham. But you won't have to
    retrain the filter.

    (See and
    for more info. . . .)

    > For example: My JMC is catching 99.8% of all spam sent to me
    > False Positive ratio is zero (none in six months)
    >
    > Why would I have to fiddle with it? All I do is mark those that are spam
    > as spam, and unmark any it catches which are not
    >
    > No fiddling required - no pruning required.
    >
    > Now, you recommend that you mark good messages as non-junk, but then you
    > recommend 'pruning' the training.dat. May I ask why?


    See above.

    > compared to mine it seems you are just wasting time and energy. Dont
    > bloat the file in the first place and you wouldnt have to prune it later
    > it seems to me.


    Given the nature of spam, it's going to bloat anyway. That's the point of
    pruning it. The string |2340894-03089-34| might occur in one spam, and
    |asdlijrq23098| in another, and so on and so forth -- this is the crap that
    bloats the file.

    > If your catch ratio was higher than mine, you might have a point. But it
    > isnt.


    How do you know?

    > Likewise if your false postive ratio was lower than mine, but
    > again it isnt.


    How do you know?

    > So what, besides the technical data available, is the
    > value of the tool. I have it, but dont see a requirement for it at all.


    Dan, you and I will never agree on this. If you want to keep deleting your
    training.dat when its performance begins to degrade, and if you only want to
    mark spam, that's fine. But, as a user, I were to hear that Tb's JMC were
    as unstable as that, I'd really question the point of them if I have to
    reset them every three or four months.

    With my recommendation, you /maintain/ the database by pruning out the cruft
    that's resulting in degraded performance and also by feeding the database
    enough spam /and/ ham tokens that it can properly filter mail.

    /b.

    --
    'There is caution, and there is irrational paranoia.' -- Ron Hunter

    'And, yes, I AM a bit paranoid. After 25 years online, one gets that way,
    if one survives.' -- Ron Hunter

  4. Re: Junk Mail Controls stopped working, mostly

    Brian Heinrich wrote:
    > On 2007-01-02 07:28 (-0700 UTC), Moz Champion (Dan) wrote:
    >
    >> Brian Heinrich wrote:

    >
    >
    >
    >>> 'Pends on how many unique strings there are. Hence why I recommend
    >>> the Bayes Junk Tool to prune training.dat and keep things functioning
    >>> smoothly. . . .

    >>
    >> Um, why?
    >> Why use another program/tool unless you need it?

    >
    > Because that's how the Bayesian filters work. For the filters to work
    > properly, they require ham tokens as well as spam tokens.
    >
    > Obviously, the whole Bayesian craze started with Paul Graham's article,
    > 'A Plan for Spam (). If you read
    > ,
    > you'll get a better idea of the importance of marking ham as such --
    > especially given the now near-normative use of Bayesian poisoning to try
    > to degrade the efficiency and efficacy of Bayesian filters.
    >
    > According to what you've written elsewhere, you started with a new JMC
    > six months ago, and your advice to those whose JMC are showing signs of
    > degrated performance is to delete training.dat and start from scratch.
    >
    > But by that point, the user will already have a significant database of
    > tokens -- by your method, largely spam; by my method, a more balanced
    > ration of spam to ham.
    >
    > Wouldn't it just be easier to improve the performance of JMC by clearing
    > out the cruft rather than starting from scratch and having to retrain?
    >
    > Sure, you'll still have to mark (uncaught) spam as spam and -- at least
    > if you want JMC to function properly -- mark ham as ham. But you won't
    > have to retrain the filter.
    >
    > (See and
    > for more info. . . .)
    >
    >> For example: My JMC is catching 99.8% of all spam sent to me
    >> False Positive ratio is zero (none in six months)
    >>
    >> Why would I have to fiddle with it? All I do is mark those that are
    >> spam as spam, and unmark any it catches which are not
    >>
    >> No fiddling required - no pruning required.
    >>
    >> Now, you recommend that you mark good messages as non-junk, but then
    >> you recommend 'pruning' the training.dat. May I ask why?

    >
    > See above.
    >
    >> compared to mine it seems you are just wasting time and energy. Dont
    >> bloat the file in the first place and you wouldnt have to prune it
    >> later it seems to me.

    >
    > Given the nature of spam, it's going to bloat anyway. That's the point
    > of pruning it. The string |2340894-03089-34| might occur in one spam,
    > and |asdlijrq23098| in another, and so on and so forth -- this is the
    > crap that bloats the file.
    >
    >> If your catch ratio was higher than mine, you might have a point. But
    >> it isnt.

    >
    > How do you know?
    >
    >> Likewise if your false postive ratio was lower than mine, but again it
    >> isnt.

    >
    > How do you know?
    >
    >> So what, besides the technical data available, is the value of the
    >> tool. I have it, but dont see a requirement for it at all.

    >
    > Dan, you and I will never agree on this. If you want to keep deleting
    > your training.dat when its performance begins to degrade, and if you
    > only want to mark spam, that's fine. But, as a user, I were to hear
    > that Tb's JMC were as unstable as that, I'd really question the point of
    > them if I have to reset them every three or four months.
    >
    > With my recommendation, you /maintain/ the database by pruning out the
    > cruft that's resulting in degraded performance and also by feeding the
    > database enough spam /and/ ham tokens that it can properly filter mail.
    >
    > /b.
    >



    Sure Baysian filters need SOME 'good' mails to operate well, but hey
    dont need ALL of them!

    Once again, my JMC is catching 99.8% of spam sent my way... So mine IS
    working well, very well indeed! Is yours at that percentage?
    For every 200 spam sent my way, ONE single solitary one makes it into my
    inbox, so please tell me why you think I need to mark 'good' messages as
    non-junk again?

    I have NEVER had to dump my training.dat, I tell people with larger
    training.dat files who are having problems to do so, yes, and it clears
    their problem!

    Well, IS your catch ratio BETTER than mine? Is it up to 99.8% or not?
    I know for a fact that your false positive ratio is NOT better, because
    mine is at zero, and you cant get better than that.


    Why 'tweak' something that is working at 99.8% efficiency? I dont have
    to prune anything, I dont have to adjust anything, I simply mark spam as
    spam, and unmark and spam that isnt. Real simple. I dont have to DUMP my
    training.dat, because I am NOT bloating it up with marking known good
    messages!

    The person in this thread did what you suggested and ended up with a
    940KB training.dat after one day and 300 spam... and problems begin over
    the 1MB level. Yet my JMC has caught well in excess of 1000 spam in 6
    months with NO false positive and my training.dat is 340KB - at this
    rate I could operated for over a year and still not reach what he did in
    ONE Day! So tell me again, what value there is to marking known good
    messages?

    Heck, mine is at 99.8 percent efficiency, you cant even improve it much!
    I suppose you COULD be at 99.9% - seems a lot of work to catch one more
    spam. And no filter is 100%. Why should I use another entire
    program/utility and learn how to use it to catch one more spam? Not
    worth it.

    It'r REALLY simple. All I have to do is mark spam that is spam as junk,
    and unmark any messages that are not spam as non junk. You have to do
    that as well. But thats where I am finished! Done, complete! While you
    have to go into the utility, tweak the filter, and prune your files.
    Lots of work, for what? ONE more spam you catch? And IS your catch ratio
    at 99.8 or better anyway? YOU are the one doing more work, not me.

  5. Re: Junk Mail Controls stopped working, mostly

    Moz Champion (Dan) wrote:
    > Penthor-Mul wrote:
    >> Moz Champion (Dan) wrote:
    >>> Penthor-Mul wrote:
    >>>> Penthor-Mul wrote:
    >>>>> Snagglepuss wrote:
    >>>>>> Penthor-Mul wrote:
    >>>>>>> Moz Champion (Dan) wrote:
    >>>>>>>> Penthor-Mul wrote:
    >>>>>>>>> version 1.5.0.9 (20061207)
    >>>>>>>>>
    >>>>>>>>> Until very recently they worked pretty good. Now about 50% of
    >>>>>>>>> the spam ends up in my inbox. I deleted all the .msf files in
    >>>>>>>>> my profile, emptied the Trash, and compacted my folders. These
    >>>>>>>>> actions don't seem to have helped. Is there anything else I
    >>>>>>>>> can try? Is this problem related to a recent update?
    >>>>>>>>>
    >>>>>>>>> Penthor
    >>>>>>>>
    >>>>>>>> How big is your training.dat file? Your training.dat file can be
    >>>>>>>> found in your profile.
    >>>>>>>>
    >>>>>>>> Is JMC (Junk Mail Controls) still getting some spam/junk? Have
    >>>>>>>> you checked the settings to ensure that it is still enabled on
    >>>>>>>> all accounts?
    >>>>>>>
    >>>>>>> Training.dat = 6MB
    >>>>>>>
    >>>>>>> The JMC is catching about 50% of the spam and correctly routing
    >>>>>>> it to the Junk folder. JMC is still activated for my account.
    >>>>>>>
    >>>>>>> I remember there was a place to set the "aggressiveness" of the
    >>>>>>> JMC between 0 & 100, but I forgot where that setting is. I think
    >>>>>>> I had set it fairly aggressive; about 5 or less.
    >>>>>>>
    >>>>>>> N
    >>>>>> SIX MEGS!!!! Yikes. I would delete the file and start again.
    >>>>>
    >>>>> OK, I'll shut down and dump that bad boy. I had no idea there was
    >>>>> a size limit for that file. I guess I missed that. Thanks for the
    >>>>> education.
    >>>>
    >>>> OK, I renamed training.dat to training.old. The new training.dat
    >>>> was created. It is already 904KB. The jury is still out on how
    >>>> effective the filter is now. I'll know in a week or so; I get a lot
    >>>> of spam, 50-75/day.
    >>>>
    >>>> Penthor
    >>>
    >>>
    >>> Tell me, are you marking non junk as non juck with the aid of Mnenhy?
    >>> My training.dat is 380KB and thats after 6 months! (Active since July)
    >>> 904KB after one day is really excessive

    >>
    >> Sorry for the delay - out of town.
    >>
    >> I am marking the good messages as "not junk". I believe my
    >> training.dat started at over 500KB after it was re-created. I am
    >> still having to mark some spam that ends up in inbox...but I think it
    >> has decreased. Progress, even a little, is good, yes?
    >>
    >> Penthor

    >
    >
    > No, I disagree with marking good messages as non-junk. It bloats the
    > training.dat for no good reason. And a large training.dat has shown to
    > be prone to making JMC less effective.
    >
    > Marking good messages as non junk will decrease the occurance of false
    > positives somewhat... but in most cases thats so low in the first place,
    > its not even a problem. I havent had a real 'false' positive on this
    > system yet - with my 'in addrress book' exception anyway.
    >
    >
    > Again, training.dat files over 1MB have been shown to impart less
    > effectiveness, and here you are with ONE day and a training.dat of 940K?
    > Almost at the limit at which problem may occur, after only one day?
    > Thats what marking known good messages does.
    >
    > Once more. I have a training.dat file of 380K... after over six months
    > of use (10 Jul 2006 first use) - and my catch ratio is 99.8% (for every
    > spam I see in my inbox, JMC has already caught 200). I have not had a
    > false positive in six months, so I figure thats a low enough ratio ,
    > you cant get better than 0.
    >
    > My suggestion to you is to desist with marking good emails as junk


    OK, I will try that. I renamed training.dat again (it was over 3MB
    already) and it was recreated. I must have been mistaken the first time
    about the initial size, because it was 50KB this time. I trained about
    100 mails and it is 91KB now. I'll monitor the size. Thanks for all
    the advice.

  6. Re: Junk Mail Controls stopped working, mostly

    Moz Champion (Dan) wrote:
    > Penthor-Mul wrote:
    >> Moz Champion (Dan) wrote:
    >>> Snagglepuss wrote:
    >>>> Brian Heinrich wrote:
    >>>>> On 2006-12-30 13:26 (-0700 UTC), Snagglepuss wrote:
    >>>>>
    >>>>>> Penthor-Mul wrote:
    >>>>>>> I get a lot of spam, 50-75/day.
    >>>>>>>
    >>>>>>> Penthor
    >>>>>>
    >>>>>> is that all. I get around 300 a day, and thats from one account.
    >>>>>> And I'm sure someone will say they get more.
    >>>>>
    >>>>> A friend of mine gets about 1 200 e-mail a day, about 400 of which
    >>>>> make it to his inbox . . . and 30-40% of that amount are legitimate.
    >>>>>
    >>>>> /b.
    >>>>>
    >>>> well, there you go. I said someone would reply, didn't I :-)
    >>>>
    >>>
    >>>
    >>> I used to get 300 per day on my accounts, and that was eight years ago.
    >>> So I did something about it... I fought back. I have much less now,
    >>> and spam has increased 20 or 30 times since then.
    >>>
    >>> but still 904K after one day is still excessive. Even with 300 per
    >>> day that means 3KB for each message! 3000Kilobytes for each message!
    >>> Heck some spam isnt that big! And since much spam is duplicated have
    >>> 904KB after 1 day (and 300 spam) is really excessive.

    >>
    >> You fought back? How so Dan? Pipe bombs
    >> Seriously, how?
    >>
    >> Penthor

    >
    > Fighting back
    >
    > Get spam
    > Go to spam site (to see if its still there!)
    > do a traceroute to the spam site
    > report the site to its host for spamming
    >
    > Do the same for any 'image' site used in the spam
    > and for any 'sign-off' site as well.
    >
    > I use Visual Route for tracing, but it can be done with NeoTrace or just
    > the normal utilities on any computer (trace route/dig etc)
    >
    > Its simple, its easy, and effective. Can reduce spam intake up to 99%
    > (mine was reduced from 200/300 per day (and this in 1997 - it would be
    > over 2000 today) to less than 10 per week.
    >
    > Most ISPs will take action within the week, some large ones take two
    > weeks (AT&T and China.com for example). Spammer cant sell anything with
    > no web site, hurts them where it counts.


    Oooooo, I hate to have to track down and report all those spams. But if
    it will result in those kind of results..... Excuse my ignorance, what
    is an 'image site' and 'sign off' site.

  7. Re: Junk Mail Controls stopped working, mostly

    Penthor-Mul wrote:
    > Moz Champion (Dan) wrote:
    >> Penthor-Mul wrote:
    >>> Moz Champion (Dan) wrote:
    >>>> Penthor-Mul wrote:
    >>>>> Penthor-Mul wrote:
    >>>>>> Snagglepuss wrote:
    >>>>>>> Penthor-Mul wrote:
    >>>>>>>> Moz Champion (Dan) wrote:
    >>>>>>>>> Penthor-Mul wrote:
    >>>>>>>>>> version 1.5.0.9 (20061207)
    >>>>>>>>>>
    >>>>>>>>>> Until very recently they worked pretty good. Now about 50% of
    >>>>>>>>>> the spam ends up in my inbox. I deleted all the .msf files in
    >>>>>>>>>> my profile, emptied the Trash, and compacted my folders.
    >>>>>>>>>> These actions don't seem to have helped. Is there anything
    >>>>>>>>>> else I can try? Is this problem related to a recent update?
    >>>>>>>>>>
    >>>>>>>>>> Penthor
    >>>>>>>>>
    >>>>>>>>> How big is your training.dat file? Your training.dat file can
    >>>>>>>>> be found in your profile.
    >>>>>>>>>
    >>>>>>>>> Is JMC (Junk Mail Controls) still getting some spam/junk? Have
    >>>>>>>>> you checked the settings to ensure that it is still enabled on
    >>>>>>>>> all accounts?
    >>>>>>>>
    >>>>>>>> Training.dat = 6MB
    >>>>>>>>
    >>>>>>>> The JMC is catching about 50% of the spam and correctly routing
    >>>>>>>> it to the Junk folder. JMC is still activated for my account.
    >>>>>>>>
    >>>>>>>> I remember there was a place to set the "aggressiveness" of the
    >>>>>>>> JMC between 0 & 100, but I forgot where that setting is. I
    >>>>>>>> think I had set it fairly aggressive; about 5 or less.
    >>>>>>>>
    >>>>>>>> N
    >>>>>>> SIX MEGS!!!! Yikes. I would delete the file and start again.
    >>>>>>
    >>>>>> OK, I'll shut down and dump that bad boy. I had no idea there was
    >>>>>> a size limit for that file. I guess I missed that. Thanks for
    >>>>>> the education.
    >>>>>
    >>>>> OK, I renamed training.dat to training.old. The new training.dat
    >>>>> was created. It is already 904KB. The jury is still out on how
    >>>>> effective the filter is now. I'll know in a week or so; I get a
    >>>>> lot of spam, 50-75/day.
    >>>>>
    >>>>> Penthor
    >>>>
    >>>>
    >>>> Tell me, are you marking non junk as non juck with the aid of Mnenhy?
    >>>> My training.dat is 380KB and thats after 6 months! (Active since July)
    >>>> 904KB after one day is really excessive
    >>>
    >>> Sorry for the delay - out of town.
    >>>
    >>> I am marking the good messages as "not junk". I believe my
    >>> training.dat started at over 500KB after it was re-created. I am
    >>> still having to mark some spam that ends up in inbox...but I think it
    >>> has decreased. Progress, even a little, is good, yes?
    >>>
    >>> Penthor

    >>
    >>
    >> No, I disagree with marking good messages as non-junk. It bloats the
    >> training.dat for no good reason. And a large training.dat has shown to
    >> be prone to making JMC less effective.
    >>
    >> Marking good messages as non junk will decrease the occurance of false
    >> positives somewhat... but in most cases thats so low in the first
    >> place, its not even a problem. I havent had a real 'false' positive on
    >> this system yet - with my 'in addrress book' exception anyway.
    >>
    >>
    >> Again, training.dat files over 1MB have been shown to impart less
    >> effectiveness, and here you are with ONE day and a training.dat of
    >> 940K? Almost at the limit at which problem may occur, after only one
    >> day? Thats what marking known good messages does.
    >>
    >> Once more. I have a training.dat file of 380K... after over six months
    >> of use (10 Jul 2006 first use) - and my catch ratio is 99.8% (for
    >> every spam I see in my inbox, JMC has already caught 200). I have not
    >> had a false positive in six months, so I figure thats a low enough
    >> ratio , you cant get better than 0.
    >>
    >> My suggestion to you is to desist with marking good emails as junk

    >
    > OK, I will try that. I renamed training.dat again (it was over 3MB
    > already) and it was recreated. I must have been mistaken the first time
    > about the initial size, because it was 50KB this time. I trained about
    > 100 mails and it is 91KB now. I'll monitor the size. Thanks for all
    > the advice.


    And I have no idea what my efficiency really is. I have about 13
    folders that receive filtered mail, not including the default folders.
    I don't always get to each folder each day to read the newsletters, etc.
    I only really try to keep up with the counts in my Inbox and Junk folders.

  8. Re: Junk Mail Controls stopped working, mostly

    On 2007-01-02 16:12 (-0700 UTC), Moz Champion (Dan) wrote:

    > Brian Heinrich wrote:
    >> On 2007-01-02 07:28 (-0700 UTC), Moz Champion (Dan) wrote:
    >>
    >>> Brian Heinrich wrote:

    >>
    >>
    >>
    >>>> 'Pends on how many unique strings there are. Hence why I recommend
    >>>> the Bayes Junk Tool to prune training.dat and keep things
    >>>> functioning smoothly. . . .
    >>>
    >>> Um, why?
    >>> Why use another program/tool unless you need it?

    >>
    >> Because that's how the Bayesian filters work. For the filters to work
    >> properly, they require ham tokens as well as spam tokens.
    >>
    >> Obviously, the whole Bayesian craze started with Paul Graham's
    >> article, 'A Plan for Spam (). If
    >> you read
    >> ,
    >> you'll get a better idea of the importance of marking ham as such --
    >> especially given the now near-normative use of Bayesian poisoning to
    >> try to degrade the efficiency and efficacy of Bayesian filters.
    >>
    >> According to what you've written elsewhere, you started with a new JMC
    >> six months ago, and your advice to those whose JMC are showing signs
    >> of degrated performance is to delete training.dat and start from scratch.
    >>
    >> But by that point, the user will already have a significant database
    >> of tokens -- by your method, largely spam; by my method, a more
    >> balanced ration of spam to ham.
    >>
    >> Wouldn't it just be easier to improve the performance of JMC by
    >> clearing out the cruft rather than starting from scratch and having to
    >> retrain?
    >>
    >> Sure, you'll still have to mark (uncaught) spam as spam and -- at
    >> least if you want JMC to function properly -- mark ham as ham. But
    >> you won't have to retrain the filter.
    >>
    >> (See and
    >> for more info. .
    >> . .)
    >>
    >>> For example: My JMC is catching 99.8% of all spam sent to me
    >>> False Positive ratio is zero (none in six months)
    >>>
    >>> Why would I have to fiddle with it? All I do is mark those that are
    >>> spam as spam, and unmark any it catches which are not
    >>>
    >>> No fiddling required - no pruning required.
    >>>
    >>> Now, you recommend that you mark good messages as non-junk, but then
    >>> you recommend 'pruning' the training.dat. May I ask why?

    >>
    >> See above.
    >>
    >>> compared to mine it seems you are just wasting time and energy. Dont
    >>> bloat the file in the first place and you wouldnt have to prune it
    >>> later it seems to me.

    >>
    >> Given the nature of spam, it's going to bloat anyway. That's the
    >> point of pruning it. The string |2340894-03089-34| might occur in one
    >> spam, and |asdlijrq23098| in another, and so on and so forth -- this
    >> is the crap that bloats the file.
    >>
    >>> If your catch ratio was higher than mine, you might have a point. But
    >>> it isnt.

    >>
    >> How do you know?
    >>
    >>> Likewise if your false postive ratio was lower than mine, but again
    >>> it isnt.

    >>
    >> How do you know?
    >>
    >>> So what, besides the technical data available, is the value of the
    >>> tool. I have it, but dont see a requirement for it at all.

    >>
    >> Dan, you and I will never agree on this. If you want to keep deleting
    >> your training.dat when its performance begins to degrade, and if you
    >> only want to mark spam, that's fine. But, as a user, I were to hear
    >> that Tb's JMC were as unstable as that, I'd really question the point
    >> of them if I have to reset them every three or four months.
    >>
    >> With my recommendation, you /maintain/ the database by pruning out the
    >> cruft that's resulting in degraded performance and also by feeding the
    >> database enough spam /and/ ham tokens that it can properly filter mail.
    >>
    >> /b.
    >>

    >
    >
    > Sure Baysian filters need SOME 'good' mails to operate well, but hey
    > dont need ALL of them!
    >
    > Once again, my JMC is catching 99.8% of spam sent my way... So mine IS
    > working well, very well indeed! Is yours at that percentage?
    > For every 200 spam sent my way, ONE single solitary one makes it into my
    > inbox, so please tell me why you think I need to mark 'good' messages as
    > non-junk again?
    >
    > I have NEVER had to dump my training.dat, I tell people with larger
    > training.dat files who are having problems to do so, yes, and it clears
    > their problem!
    >
    > Well, IS your catch ratio BETTER than mine? Is it up to 99.8% or not?
    > I know for a fact that your false positive ratio is NOT better, because
    > mine is at zero, and you cant get better than that.
    >
    >
    > Why 'tweak' something that is working at 99.8% efficiency? I dont have
    > to prune anything, I dont have to adjust anything, I simply mark spam as
    > spam, and unmark and spam that isnt. Real simple. I dont have to DUMP my
    > training.dat, because I am NOT bloating it up with marking known good
    > messages!
    >
    > The person in this thread did what you suggested and ended up with a
    > 940KB training.dat after one day and 300 spam... and problems begin over
    > the 1MB level. Yet my JMC has caught well in excess of 1000 spam in 6
    > months with NO false positive and my training.dat is 340KB - at this
    > rate I could operated for over a year and still not reach what he did in
    > ONE Day! So tell me again, what value there is to marking known good
    > messages?
    >
    > Heck, mine is at 99.8 percent efficiency, you cant even improve it much!
    > I suppose you COULD be at 99.9% - seems a lot of work to catch one more
    > spam. And no filter is 100%. Why should I use another entire
    > program/utility and learn how to use it to catch one more spam? Not
    > worth it.
    >
    > It'r REALLY simple. All I have to do is mark spam that is spam as junk,
    > and unmark any messages that are not spam as non junk. You have to do
    > that as well. But thats where I am finished! Done, complete! While you
    > have to go into the utility, tweak the filter, and prune your files.
    > Lots of work, for what? ONE more spam you catch? And IS your catch ratio
    > at 99.8 or better anyway? YOU are the one doing more work, not me.


    I'm simply not going to argue this with you any more, Dan. If you can't or
    /won't/ understand how Bayesian filters work, that's fine, but please don't
    go about telling people that they don't need to mark ham as well. There is
    a reason -- even if you refuse to acknowledge it -- that, even in the UI
    Junk Settings, it indicates that you need to mark both spam and ham --
    simply correcting false positives isn't entirely adequate.

    Since setting up this box six or so weeks back, my catch ratio has been 100%
    with one false positive.

    I just pruned my training.dat again, deleting good and bad tokens
    separately, using a threshold of 5 for the former and 20 for the latter,
    which, ironically, gave me a total of 2 201 good tokens and 2 201 bad tokens
    in a training.dat that's a whopping, um, 61 kB in size.

    Are you honestly asking us to believe that you have never, in the five years
    or so that JMC have been part of Moz/Tb, deleted or reset your training.dat
    and that it is a mere 340 kB in size even thought you're getting roughly 200
    spam a month? That's something over 2 000 spam a year; over five years,
    that would be over 10 000 spam.

    And yet, despite all the tokens that would have been generated and despite
    the fact that you have never reset or pruned it, your training.dat is merely
    340 kB in size?

    You must excuse me if I find that somewhat . . . difficult . . . to believe.
    .. . .

    /b.

    --
    'There is caution, and there is irrational paranoia.' -- Ron Hunter

    'And, yes, I AM a bit paranoid. After 25 years online, one gets that way,
    if one survives.' -- Ron Hunter

  9. Re: Junk Mail Controls stopped working, mostly

    On 2007-01-02 19:52 (-0700 UTC), Brian Heinrich wrote:



    > There is a reason -- even if you refuse to acknowledge it -- that, even
    > in the UI Junk Settings, it indicates that you need to mark both spam
    > and ham -- simply correcting false positives isn't entirely adequate.


    I've been playing with a Tuffmail trial account; in their Auto-Train/
    folder, there is a default message that points to
    .

    Note, in particular, the following:


    *Bayesian Classifier*

    *IMPORTANT:* You have to train with at least 1 ham and 1 spam message before
    the classifier will operate.

    *VERY IMPORTANT:* Enabling the classifier with a dozen ham messages and
    hundreds of spam messages, or vice versa, will most surely result in
    mis-classification of messages.


    /b.



    --
    'There is caution, and there is irrational paranoia.' -- Ron Hunter

    'And, yes, I AM a bit paranoid. After 25 years online, one gets that way,
    if one survives.' -- Ron Hunter

  10. Re: Junk Mail Controls stopped working, mostly

    Brian Heinrich wrote:
    > On 2007-01-02 16:12 (-0700 UTC), Moz Champion (Dan) wrote:
    >
    >> Brian Heinrich wrote:
    >>> On 2007-01-02 07:28 (-0700 UTC), Moz Champion (Dan) wrote:
    >>>
    >>>> Brian Heinrich wrote:
    >>>
    >>>
    >>>
    >>>>> 'Pends on how many unique strings there are. Hence why I recommend
    >>>>> the Bayes Junk Tool to prune training.dat and keep things
    >>>>> functioning smoothly. . . .
    >>>>
    >>>> Um, why?
    >>>> Why use another program/tool unless you need it?
    >>>
    >>> Because that's how the Bayesian filters work. For the filters to
    >>> work properly, they require ham tokens as well as spam tokens.
    >>>
    >>> Obviously, the whole Bayesian craze started with Paul Graham's
    >>> article, 'A Plan for Spam ().
    >>> If you read
    >>> ,
    >>> you'll get a better idea of the importance of marking ham as such --
    >>> especially given the now near-normative use of Bayesian poisoning to
    >>> try to degrade the efficiency and efficacy of Bayesian filters.
    >>>
    >>> According to what you've written elsewhere, you started with a new
    >>> JMC six months ago, and your advice to those whose JMC are showing
    >>> signs of degrated performance is to delete training.dat and start
    >>> from scratch.
    >>>
    >>> But by that point, the user will already have a significant database
    >>> of tokens -- by your method, largely spam; by my method, a more
    >>> balanced ration of spam to ham.
    >>>
    >>> Wouldn't it just be easier to improve the performance of JMC by
    >>> clearing out the cruft rather than starting from scratch and having
    >>> to retrain?
    >>>
    >>> Sure, you'll still have to mark (uncaught) spam as spam and -- at
    >>> least if you want JMC to function properly -- mark ham as ham. But
    >>> you won't have to retrain the filter.
    >>>
    >>> (See and
    >>> for more info.
    >>> . . .)
    >>>
    >>>> For example: My JMC is catching 99.8% of all spam sent to me
    >>>> False Positive ratio is zero (none in six months)
    >>>>
    >>>> Why would I have to fiddle with it? All I do is mark those that are
    >>>> spam as spam, and unmark any it catches which are not
    >>>>
    >>>> No fiddling required - no pruning required.
    >>>>
    >>>> Now, you recommend that you mark good messages as non-junk, but then
    >>>> you recommend 'pruning' the training.dat. May I ask why?
    >>>
    >>> See above.
    >>>
    >>>> compared to mine it seems you are just wasting time and energy. Dont
    >>>> bloat the file in the first place and you wouldnt have to prune it
    >>>> later it seems to me.
    >>>
    >>> Given the nature of spam, it's going to bloat anyway. That's the
    >>> point of pruning it. The string |2340894-03089-34| might occur in
    >>> one spam, and |asdlijrq23098| in another, and so on and so forth --
    >>> this is the crap that bloats the file.
    >>>
    >>>> If your catch ratio was higher than mine, you might have a point.
    >>>> But it isnt.
    >>>
    >>> How do you know?
    >>>
    >>>> Likewise if your false postive ratio was lower than mine, but again
    >>>> it isnt.
    >>>
    >>> How do you know?
    >>>
    >>>> So what, besides the technical data available, is the value of the
    >>>> tool. I have it, but dont see a requirement for it at all.
    >>>
    >>> Dan, you and I will never agree on this. If you want to keep
    >>> deleting your training.dat when its performance begins to degrade,
    >>> and if you only want to mark spam, that's fine. But, as a user, I
    >>> were to hear that Tb's JMC were as unstable as that, I'd really
    >>> question the point of them if I have to reset them every three or
    >>> four months.
    >>>
    >>> With my recommendation, you /maintain/ the database by pruning out
    >>> the cruft that's resulting in degraded performance and also by
    >>> feeding the database enough spam /and/ ham tokens that it can
    >>> properly filter mail.
    >>>
    >>> /b.
    >>>

    >>
    >>
    >> Sure Baysian filters need SOME 'good' mails to operate well, but hey
    >> dont need ALL of them!
    >>
    >> Once again, my JMC is catching 99.8% of spam sent my way... So mine IS
    >> working well, very well indeed! Is yours at that percentage?
    >> For every 200 spam sent my way, ONE single solitary one makes it into
    >> my inbox, so please tell me why you think I need to mark 'good'
    >> messages as non-junk again?
    >>
    >> I have NEVER had to dump my training.dat, I tell people with larger
    >> training.dat files who are having problems to do so, yes, and it
    >> clears their problem!
    >>
    >> Well, IS your catch ratio BETTER than mine? Is it up to 99.8% or not?
    >> I know for a fact that your false positive ratio is NOT better,
    >> because mine is at zero, and you cant get better than that.
    >>
    >>
    >> Why 'tweak' something that is working at 99.8% efficiency? I dont have
    >> to prune anything, I dont have to adjust anything, I simply mark spam
    >> as spam, and unmark and spam that isnt. Real simple. I dont have to
    >> DUMP my training.dat, because I am NOT bloating it up with marking
    >> known good messages!
    >>
    >> The person in this thread did what you suggested and ended up with a
    >> 940KB training.dat after one day and 300 spam... and problems begin
    >> over the 1MB level. Yet my JMC has caught well in excess of 1000 spam
    >> in 6 months with NO false positive and my training.dat is 340KB - at
    >> this rate I could operated for over a year and still not reach what he
    >> did in ONE Day! So tell me again, what value there is to marking known
    >> good messages?
    >>
    >> Heck, mine is at 99.8 percent efficiency, you cant even improve it much!
    >> I suppose you COULD be at 99.9% - seems a lot of work to catch one
    >> more spam. And no filter is 100%. Why should I use another entire
    >> program/utility and learn how to use it to catch one more spam? Not
    >> worth it.
    >>
    >> It'r REALLY simple. All I have to do is mark spam that is spam as
    >> junk, and unmark any messages that are not spam as non junk. You have
    >> to do that as well. But thats where I am finished! Done, complete!
    >> While you have to go into the utility, tweak the filter, and prune
    >> your files. Lots of work, for what? ONE more spam you catch? And IS
    >> your catch ratio at 99.8 or better anyway? YOU are the one doing more
    >> work, not me.

    >
    > I'm simply not going to argue this with you any more, Dan. If you can't
    > or /won't/ understand how Bayesian filters work, that's fine, but please
    > don't go about telling people that they don't need to mark ham as well.
    > There is a reason -- even if you refuse to acknowledge it -- that, even
    > in the UI Junk Settings, it indicates that you need to mark both spam
    > and ham -- simply correcting false positives isn't entirely adequate.
    >
    > Since setting up this box six or so weeks back, my catch ratio has been
    > 100% with one false positive.
    >
    > I just pruned my training.dat again, deleting good and bad tokens
    > separately, using a threshold of 5 for the former and 20 for the latter,
    > which, ironically, gave me a total of 2 201 good tokens and 2 201 bad
    > tokens in a training.dat that's a whopping, um, 61 kB in size.
    >
    > Are you honestly asking us to believe that you have never, in the five
    > years or so that JMC have been part of Moz/Tb, deleted or reset your
    > training.dat and that it is a mere 340 kB in size even thought you're
    > getting roughly 200 spam a month? That's something over 2 000 spam a
    > year; over five years, that would be over 10 000 spam.
    >
    > And yet, despite all the tokens that would have been generated and
    > despite the fact that you have never reset or pruned it, your
    > training.dat is merely 340 kB in size?
    >
    > You must excuse me if I find that somewhat . . . difficult . . . to
    > believe. . . .
    >
    > /b.
    >


    If you had read my posts you would have noted that my training.dat was
    re-started when I got this machine - on 10 July 2006 it was zero. But of
    course you dont read my posts anyway.

  11. Re: Junk Mail Controls stopped working, mostly

    Brian Heinrich wrote:
    > On 2007-01-02 19:52 (-0700 UTC), Brian Heinrich wrote:
    >
    >
    >
    >> There is a reason -- even if you refuse to acknowledge it -- that,
    >> even in the UI Junk Settings, it indicates that you need to mark both
    >> spam and ham -- simply correcting false positives isn't entirely
    >> adequate.

    >
    > I've been playing with a Tuffmail trial account; in their Auto-Train/
    > folder, there is a default message that points to
    > .
    >
    > Note, in particular, the following:
    >
    >
    > *Bayesian Classifier*
    >
    > *IMPORTANT:* You have to train with at least 1 ham and 1 spam message
    > before the classifier will operate.
    >
    > *VERY IMPORTANT:* Enabling the classifier with a dozen ham messages and
    > hundreds of spam messages, or vice versa, will most surely result in
    > mis-classification of messages.
    >

    >
    > /b.
    >
    >
    >


    Yep, and thats taken care of when you start the program. All maail is
    ramked as spam and you unmark one (or more)/restart/ all mail is not
    marked until you mark one.

    It says ONE each, it doesnt say thousands each! Again, my catch ratio is
    99.8%, NO false positives. And you want me to learn another utility,
    tweak something, and prune files regularly to to get two spam? Not worth
    my while. It takes me oh, perhaps 2 seconds to mark those messages as
    junk - but you want me to spend minutes marking all my good messages as
    junk and pruning files?

    You mean to save 2 seconds a week or so you would gladly advocate
    spending minutes per week? I wouldnt call that being effective.

  12. Re: Junk Mail Controls stopped working, mostly

    Penthor-Mul wrote:
    > Moz Champion (Dan) wrote:
    >> Penthor-Mul wrote:
    >>> Moz Champion (Dan) wrote:
    >>>> Snagglepuss wrote:
    >>>>> Brian Heinrich wrote:
    >>>>>> On 2006-12-30 13:26 (-0700 UTC), Snagglepuss wrote:
    >>>>>>
    >>>>>>> Penthor-Mul wrote:
    >>>>>>>> I get a lot of spam, 50-75/day.
    >>>>>>>>
    >>>>>>>> Penthor
    >>>>>>>
    >>>>>>> is that all. I get around 300 a day, and thats from one
    >>>>>>> account. And I'm sure someone will say they get more.
    >>>>>>
    >>>>>> A friend of mine gets about 1 200 e-mail a day, about 400 of which
    >>>>>> make it to his inbox . . . and 30-40% of that amount are legitimate.
    >>>>>>
    >>>>>> /b.
    >>>>>>
    >>>>> well, there you go. I said someone would reply, didn't I :-)
    >>>>>
    >>>>
    >>>>
    >>>> I used to get 300 per day on my accounts, and that was eight years ago.
    >>>> So I did something about it... I fought back. I have much less now,
    >>>> and spam has increased 20 or 30 times since then.
    >>>>
    >>>> but still 904K after one day is still excessive. Even with 300 per
    >>>> day that means 3KB for each message! 3000Kilobytes for each message!
    >>>> Heck some spam isnt that big! And since much spam is duplicated have
    >>>> 904KB after 1 day (and 300 spam) is really excessive.
    >>>
    >>> You fought back? How so Dan? Pipe bombs
    >>> Seriously, how?
    >>>
    >>> Penthor

    >>
    >> Fighting back
    >>
    >> Get spam
    >> Go to spam site (to see if its still there!)
    >> do a traceroute to the spam site
    >> report the site to its host for spamming
    >>
    >> Do the same for any 'image' site used in the spam
    >> and for any 'sign-off' site as well.
    >>
    >> I use Visual Route for tracing, but it can be done with NeoTrace or
    >> just the normal utilities on any computer (trace route/dig etc)
    >>
    >> Its simple, its easy, and effective. Can reduce spam intake up to 99%
    >> (mine was reduced from 200/300 per day (and this in 1997 - it would be
    >> over 2000 today) to less than 10 per week.
    >>
    >> Most ISPs will take action within the week, some large ones take two
    >> weeks (AT&T and China.com for example). Spammer cant sell anything
    >> with no web site, hurts them where it counts.

    >
    > Oooooo, I hate to have to track down and report all those spams. But if
    > it will result in those kind of results..... Excuse my ignorance, what
    > is an 'image site' and 'sign off' site.



    An 'image site' is a site that provides images to send in the spam. The
    image is not sent as an attachment, but as an online image
    A 'sign off' site is a site that purposts to take you off the mailing
    list (which you shouldnt be on in the first place)

  13. Re: Junk Mail Controls stopped working, mostly

    Penthor-Mul wrote:
    > Penthor-Mul wrote:
    >> Moz Champion (Dan) wrote:
    >>> Penthor-Mul wrote:
    >>>> Moz Champion (Dan) wrote:
    >>>>> Penthor-Mul wrote:
    >>>>>> Penthor-Mul wrote:
    >>>>>>> Snagglepuss wrote:
    >>>>>>>> Penthor-Mul wrote:
    >>>>>>>>> Moz Champion (Dan) wrote:
    >>>>>>>>>> Penthor-Mul wrote:
    >>>>>>>>>>> version 1.5.0.9 (20061207)
    >>>>>>>>>>>
    >>>>>>>>>>> Until very recently they worked pretty good. Now about 50%
    >>>>>>>>>>> of the spam ends up in my inbox. I deleted all the .msf
    >>>>>>>>>>> files in my profile, emptied the Trash, and compacted my
    >>>>>>>>>>> folders. These actions don't seem to have helped. Is there
    >>>>>>>>>>> anything else I can try? Is this problem related to a recent
    >>>>>>>>>>> update?
    >>>>>>>>>>>
    >>>>>>>>>>> Penthor
    >>>>>>>>>>
    >>>>>>>>>> How big is your training.dat file? Your training.dat file can
    >>>>>>>>>> be found in your profile.
    >>>>>>>>>>
    >>>>>>>>>> Is JMC (Junk Mail Controls) still getting some spam/junk? Have
    >>>>>>>>>> you checked the settings to ensure that it is still enabled on
    >>>>>>>>>> all accounts?
    >>>>>>>>>
    >>>>>>>>> Training.dat = 6MB
    >>>>>>>>>
    >>>>>>>>> The JMC is catching about 50% of the spam and correctly routing
    >>>>>>>>> it to the Junk folder. JMC is still activated for my account.
    >>>>>>>>>
    >>>>>>>>> I remember there was a place to set the "aggressiveness" of the
    >>>>>>>>> JMC between 0 & 100, but I forgot where that setting is. I
    >>>>>>>>> think I had set it fairly aggressive; about 5 or less.
    >>>>>>>>>
    >>>>>>>>> N
    >>>>>>>> SIX MEGS!!!! Yikes. I would delete the file and start again.
    >>>>>>>
    >>>>>>> OK, I'll shut down and dump that bad boy. I had no idea there
    >>>>>>> was a size limit for that file. I guess I missed that. Thanks
    >>>>>>> for the education.
    >>>>>>
    >>>>>> OK, I renamed training.dat to training.old. The new training.dat
    >>>>>> was created. It is already 904KB. The jury is still out on how
    >>>>>> effective the filter is now. I'll know in a week or so; I get a
    >>>>>> lot of spam, 50-75/day.
    >>>>>>
    >>>>>> Penthor
    >>>>>
    >>>>>
    >>>>> Tell me, are you marking non junk as non juck with the aid of Mnenhy?
    >>>>> My training.dat is 380KB and thats after 6 months! (Active since July)
    >>>>> 904KB after one day is really excessive
    >>>>
    >>>> Sorry for the delay - out of town.
    >>>>
    >>>> I am marking the good messages as "not junk". I believe my
    >>>> training.dat started at over 500KB after it was re-created. I am
    >>>> still having to mark some spam that ends up in inbox...but I think
    >>>> it has decreased. Progress, even a little, is good, yes?
    >>>>
    >>>> Penthor
    >>>
    >>>
    >>> No, I disagree with marking good messages as non-junk. It bloats the
    >>> training.dat for no good reason. And a large training.dat has shown
    >>> to be prone to making JMC less effective.
    >>>
    >>> Marking good messages as non junk will decrease the occurance of
    >>> false positives somewhat... but in most cases thats so low in the
    >>> first place, its not even a problem. I havent had a real 'false'
    >>> positive on this system yet - with my 'in addrress book' exception
    >>> anyway.
    >>>
    >>>
    >>> Again, training.dat files over 1MB have been shown to impart less
    >>> effectiveness, and here you are with ONE day and a training.dat of
    >>> 940K? Almost at the limit at which problem may occur, after only one
    >>> day? Thats what marking known good messages does.
    >>>
    >>> Once more. I have a training.dat file of 380K... after over six
    >>> months of use (10 Jul 2006 first use) - and my catch ratio is 99.8%
    >>> (for every spam I see in my inbox, JMC has already caught 200). I
    >>> have not had a false positive in six months, so I figure thats a low
    >>> enough ratio , you cant get better than 0.
    >>>
    >>> My suggestion to you is to desist with marking good emails as junk

    >>
    >> OK, I will try that. I renamed training.dat again (it was over 3MB
    >> already) and it was recreated. I must have been mistaken the first
    >> time about the initial size, because it was 50KB this time. I trained
    >> about 100 mails and it is 91KB now. I'll monitor the size. Thanks
    >> for all the advice.

    >
    > And I have no idea what my efficiency really is. I have about 13
    > folders that receive filtered mail, not including the default folders. I
    > don't always get to each folder each day to read the newsletters, etc.
    > I only really try to keep up with the counts in my Inbox and Junk folders.


    OK, I've gone through about 75 mails (maybe 8 hams) and have had 4 false
    positives. I'm only marking the unmarked junk and the false positives.
    My training.dat is 91KB.

  14. Re: Junk Mail Controls stopped working, mostly

    On 2007-01-03 03:52 (-0700 UTC), Moz Champion (Dan) wrote:



    > If you had read my posts you would have noted that my training.dat was
    > re-started when I got this machine - on 10 July 2006 it was zero. But of
    > course you dont read my posts anyway.


    Dan, don't be an ass. I indicated that I thought you'd posted somewhere
    that you'd reset your training.dat about six months ago. Rather than
    indicate that your training.dat was only six months old, you replied that
    you'd never done so. (Which, of course, still begs the question of the
    previous four-and-a-half years. . . .)

    I'm neither going to argue or deny your claim that your training.dat is only
    340 kB. It still strikes me as being suspiciously small, but it is possible
    if you get a lot of spam that doesn't try to poison the filters or that
    doesn't contain random strings.

    That said, you've still not provided any documentary evidence to support
    your claim that ham doesn't need to be marked as such in order for JMC to
    function well. . . .

    /b.

    --
    'There is caution, and there is irrational paranoia.' -- Ron Hunter

    'And, yes, I AM a bit paranoid. After 25 years online, one gets that way,
    if one survives.' -- Ron Hunter

  15. Re: Junk Mail Controls stopped working, mostly

    On 2007-01-03 03:58 (-0700 UTC), Moz Champion (Dan) wrote:

    > Brian Heinrich wrote:
    >> On 2007-01-02 19:52 (-0700 UTC), Brian Heinrich wrote:
    >>
    >>
    >>
    >>> There is a reason -- even if you refuse to acknowledge it -- that,
    >>> even in the UI Junk Settings, it indicates that you need to mark both
    >>> spam and ham -- simply correcting false positives isn't entirely
    >>> adequate.

    >>
    >> I've been playing with a Tuffmail trial account; in their Auto-Train/
    >> folder, there is a default message that points to
    >> .
    >>
    >> Note, in particular, the following:
    >>
    >>
    >> *Bayesian Classifier*
    >>
    >> *IMPORTANT:* You have to train with at least 1 ham and 1 spam message
    >> before the classifier will operate.
    >>
    >> *VERY IMPORTANT:* Enabling the classifier with a dozen ham messages
    >> and hundreds of spam messages, or vice versa, will most surely result
    >> in mis-classification of messages.
    >>

    >>
    >> /b.
    >>
    >>

    >
    > Yep, and thats taken care of when you start the program. All maail is
    > ramked as spam and you unmark one (or more)/restart/ all mail is not
    > marked until you mark one.


    I'm not talking about Thumperbunny now; I'm talking about Tuffmail's Web client.

    > It says ONE each, [ . . . ]


    Actually, it says 'You have to train with /at least/ 1 ham and 1 spam
    message before the classifier will operate.'

    > [ . . . ] it doesnt say thousands each!


    You're being wilfully obtuse, Dan, not to mention blowing smoke about
    something about which you have stated you have not used.

    I never said anything about marking thousands of ham. What I said is that
    there should be a relatively balanced number of good and bad tokens in order
    for JMC to function properly.

    When, from a term, I run |java -jar bayesjunktool-0.2.1.jar| and select my
    training.dat, I get:

    The number of good messages processed is 449
    The number of bad messages processed is 11295
    Now processing 2201 good tokens.........
    Now processing 2201 bad tokens.........
    Merging token lists...
    Launching GUI...

    A lot of those good messages (pro'ly 2/3) are in fact false positives from
    two-and-a-half years ago.

    > Again, my catch ratio is
    > 99.8%, NO false positives. And you want me to learn another utility,
    > tweak something, and prune files regularly to to get two spam? Not worth
    > my while. It takes me oh, perhaps 2 seconds to mark those messages as
    > junk - but you want me to spend minutes marking all my good messages as
    > junk and pruning files?
    >
    > You mean to save 2 seconds a week or so you would gladly advocate
    > spending minutes per week? I wouldnt call that being effective.


    Realistically, Dan, I could care less about you and what you do on your own
    machine. What I do care about is the advice being given to posters in this
    group -- the guy whose training.dat was 900 kB after one day, or the people
    whose training.dat have swollen to several MB. For them, this is a viable
    solution -- and, I would suggest, a better solution than pushing the little
    button that lets you reset your training.dat.

    You and I have been having this 'discussion' for over three-and-a-half years
    -- since before Straxus developed the BJT. It started when I indicated that
    the performance of my JMC had significantly degraded. Your response, of
    course, was to state that yours was working fine and that mine must somehow
    have been corrupted and to delete it an start anew -- which is the advice
    you continue to give to this day.

    Once the BJT was released, I realised that the issue wasn't that my
    training.dat had become corrupted but, rather, that it had /way/ too much
    cruft in it.

    If I can maintain the efficiency of JMC by seeding the database with good
    tokens every now and again and by occasional, um, maintenance, that strikes
    me as being, over all, a better solution than waiting till JMC becomes
    bloated or its performance degrades, then deleting it and starting over
    again. After all, I already have both good and bad tokens. I don't need to
    delete/reset and start over; I just need to get rid of the cruft that's
    messing with the scoring.

    If you were to execute that JAR and have a look at the contents of your
    training.dat, you might have a better sense of why I keep suggesting that
    people use the BJT. Otherwise, you're just blowing smoke and your continued
    assertions that your JMC are functioning at 99.8% (did you actually take the
    time to calculate that?) without marking ham as ham, &c, is pretty much
    meaningless: that it 'WFM' doesn't mean that it's working for others, nor
    that it's working in the way in which it was designed to work. . . .

    /b.

    --
    'There is caution, and there is irrational paranoia.' -- Ron Hunter

    'And, yes, I AM a bit paranoid. After 25 years online, one gets that way,
    if one survives.' -- Ron Hunter

  16. Re: Junk Mail Controls stopped working, mostly

    Moz Champion (Dan) wrote:
    > Penthor-Mul wrote:
    >> Moz Champion (Dan) wrote:
    >>> Penthor-Mul wrote:
    >>>> Penthor-Mul wrote:
    >>>>> Snagglepuss wrote:
    >>>>>> Penthor-Mul wrote:
    >>>>>>> Moz Champion (Dan) wrote:
    >>>>>>>> Penthor-Mul wrote:
    >>>>>>>>> version 1.5.0.9 (20061207)
    >>>>>>>>>
    >>>>>>>>> Until very recently they worked pretty good. Now about 50% of
    >>>>>>>>> the spam ends up in my inbox. I deleted all the .msf files in
    >>>>>>>>> my profile, emptied the Trash, and compacted my folders. These
    >>>>>>>>> actions don't seem to have helped. Is there anything else I
    >>>>>>>>> can try? Is this problem related to a recent update?
    >>>>>>>>>
    >>>>>>>>> Penthor
    >>>>>>>>
    >>>>>>>> How big is your training.dat file? Your training.dat file can be
    >>>>>>>> found in your profile.
    >>>>>>>>
    >>>>>>>> Is JMC (Junk Mail Controls) still getting some spam/junk? Have
    >>>>>>>> you checked the settings to ensure that it is still enabled on
    >>>>>>>> all accounts?
    >>>>>>>
    >>>>>>> Training.dat = 6MB
    >>>>>>>
    >>>>>>> The JMC is catching about 50% of the spam and correctly routing
    >>>>>>> it to the Junk folder. JMC is still activated for my account.
    >>>>>>>
    >>>>>>> I remember there was a place to set the "aggressiveness" of the
    >>>>>>> JMC between 0 & 100, but I forgot where that setting is. I think
    >>>>>>> I had set it fairly aggressive; about 5 or less.
    >>>>>>>
    >>>>>>> N
    >>>>>> SIX MEGS!!!! Yikes. I would delete the file and start again.
    >>>>>
    >>>>> OK, I'll shut down and dump that bad boy. I had no idea there was
    >>>>> a size limit for that file. I guess I missed that. Thanks for the
    >>>>> education.
    >>>>
    >>>> OK, I renamed training.dat to training.old. The new training.dat
    >>>> was created. It is already 904KB. The jury is still out on how
    >>>> effective the filter is now. I'll know in a week or so; I get a lot
    >>>> of spam, 50-75/day.
    >>>>
    >>>> Penthor
    >>>
    >>>
    >>> Tell me, are you marking non junk as non juck with the aid of Mnenhy?
    >>> My training.dat is 380KB and thats after 6 months! (Active since July)
    >>> 904KB after one day is really excessive

    >>
    >> Sorry for the delay - out of town.
    >>
    >> I am marking the good messages as "not junk". I believe my
    >> training.dat started at over 500KB after it was re-created. I am
    >> still having to mark some spam that ends up in inbox...but I think it
    >> has decreased. Progress, even a little, is good, yes?
    >>
    >> Penthor

    >
    >
    > No, I disagree with marking good messages as non-junk. It bloats the
    > training.dat for no good reason. And a large training.dat has shown to
    > be prone to making JMC less effective.
    >
    > Marking good messages as non junk will decrease the occurance of false
    > positives somewhat... but in most cases thats so low in the first place,
    > its not even a problem. I havent had a real 'false' positive on this
    > system yet - with my 'in addrress book' exception anyway.
    >
    >
    > Again, training.dat files over 1MB have been shown to impart less
    > effectiveness, and here you are with ONE day and a training.dat of 940K?
    > Almost at the limit at which problem may occur, after only one day?
    > Thats what marking known good messages does.
    >
    > Once more. I have a training.dat file of 380K... after over six months
    > of use (10 Jul 2006 first use) - and my catch ratio is 99.8% (for every
    > spam I see in my inbox, JMC has already caught 200). I have not had a
    > false positive in six months, so I figure thats a low enough ratio ,
    > you cant get better than 0.
    >
    > My suggestion to you is to desist with marking good emails as junk


    Dan,

    I have a quick question. Do you use the "Do not mark messages as
    junk mail if the sender is in my address book:" option in JMC?

    - Andrew W Applegarth

  17. Re: Junk Mail Controls stopped working, mostly

    Andrew W Applegarth wrote:

    > Dan,
    >
    > I have a quick question. Do you use the "Do not mark messages as
    > junk mail if the sender is in my address book:" option in JMC?
    >
    > - Andrew W Applegarth


    Yes

  18. Re: Junk Mail Controls stopped working, mostly

    Brian Heinrich wrote:
    > On 2007-01-03 03:58 (-0700 UTC), Moz Champion (Dan) wrote:
    >
    >> Brian Heinrich wrote:
    >>> On 2007-01-02 19:52 (-0700 UTC), Brian Heinrich wrote:
    >>>
    >>>
    >>>
    >>>> There is a reason -- even if you refuse to acknowledge it -- that,
    >>>> even in the UI Junk Settings, it indicates that you need to mark
    >>>> both spam and ham -- simply correcting false positives isn't
    >>>> entirely adequate.
    >>>
    >>> I've been playing with a Tuffmail trial account; in their Auto-Train/
    >>> folder, there is a default message that points to
    >>> .
    >>>
    >>> Note, in particular, the following:
    >>>
    >>>
    >>> *Bayesian Classifier*
    >>>
    >>> *IMPORTANT:* You have to train with at least 1 ham and 1 spam message
    >>> before the classifier will operate.
    >>>
    >>> *VERY IMPORTANT:* Enabling the classifier with a dozen ham messages
    >>> and hundreds of spam messages, or vice versa, will most surely result
    >>> in mis-classification of messages.
    >>>

    >>>
    >>> /b.
    >>>
    >>>

    >>
    >> Yep, and thats taken care of when you start the program. All maail is
    >> ramked as spam and you unmark one (or more)/restart/ all mail is not
    >> marked until you mark one.

    >
    > I'm not talking about Thumperbunny now; I'm talking about Tuffmail's Web
    > client.
    >
    >> It says ONE each, [ . . . ]

    >
    > Actually, it says 'You have to train with /at least/ 1 ham and 1 spam
    > message before the classifier will operate.'
    >
    >> [ . . . ] it doesnt say thousands each!

    >
    > You're being wilfully obtuse, Dan, not to mention blowing smoke about
    > something about which you have stated you have not used.
    >
    > I never said anything about marking thousands of ham. What I said is
    > that there should be a relatively balanced number of good and bad tokens
    > in order for JMC to function properly.
    >
    > When, from a term, I run |java -jar bayesjunktool-0.2.1.jar| and select
    > my training.dat, I get:
    >
    > The number of good messages processed is 449
    > The number of bad messages processed is 11295
    > Now processing 2201 good tokens.........
    > Now processing 2201 bad tokens.........
    > Merging token lists...
    > Launching GUI...
    >
    > A lot of those good messages (pro'ly 2/3) are in fact false positives
    > from two-and-a-half years ago.
    >
    >> Again, my catch ratio is 99.8%, NO false positives. And you want me to
    >> learn another utility, tweak something, and prune files regularly to
    >> to get two spam? Not worth my while. It takes me oh, perhaps 2 seconds
    >> to mark those messages as junk - but you want me to spend minutes
    >> marking all my good messages as junk and pruning files?
    >>
    >> You mean to save 2 seconds a week or so you would gladly advocate
    >> spending minutes per week? I wouldnt call that being effective.

    >
    > Realistically, Dan, I could care less about you and what you do on your
    > own machine. What I do care about is the advice being given to posters
    > in this group -- the guy whose training.dat was 900 kB after one day, or
    > the people whose training.dat have swollen to several MB. For them,
    > this is a viable solution -- and, I would suggest, a better solution
    > than pushing the little button that lets you reset your training.dat.
    >
    > You and I have been having this 'discussion' for over three-and-a-half
    > years -- since before Straxus developed the BJT. It started when I
    > indicated that the performance of my JMC had significantly degraded.
    > Your response, of course, was to state that yours was working fine and
    > that mine must somehow have been corrupted and to delete it an start
    > anew -- which is the advice you continue to give to this day.
    >
    > Once the BJT was released, I realised that the issue wasn't that my
    > training.dat had become corrupted but, rather, that it had /way/ too
    > much cruft in it.
    >
    > If I can maintain the efficiency of JMC by seeding the database with
    > good tokens every now and again and by occasional, um, maintenance, that
    > strikes me as being, over all, a better solution than waiting till JMC
    > becomes bloated or its performance degrades, then deleting it and
    > starting over again. After all, I already have both good and bad
    > tokens. I don't need to delete/reset and start over; I just need to get
    > rid of the cruft that's messing with the scoring.
    >
    > If you were to execute that JAR and have a look at the contents of your
    > training.dat, you might have a better sense of why I keep suggesting
    > that people use the BJT. Otherwise, you're just blowing smoke and your
    > continued assertions that your JMC are functioning at 99.8% (did you
    > actually take the time to calculate that?) without marking ham as ham,
    > &c, is pretty much meaningless: that it 'WFM' doesn't mean that it's
    > working for others, nor that it's working in the way in which it was
    > designed to work. . . .
    >
    > /b.
    >

    You want to talk about another program go to its newsgroup. I was
    speaking of Junk Mail Controls in Mozilla.

  19. Re: Junk Mail Controls stopped working, mostly

    Brian Heinrich wrote:
    > On 2007-01-03 03:52 (-0700 UTC), Moz Champion (Dan) wrote:
    >
    >
    >
    >> If you had read my posts you would have noted that my training.dat was
    >> re-started when I got this machine - on 10 July 2006 it was zero. But
    >> of course you dont read my posts anyway.

    >
    > Dan, don't be an ass. I indicated that I thought you'd posted somewhere
    > that you'd reset your training.dat about six months ago. Rather than
    > indicate that your training.dat was only six months old, you replied
    > that you'd never done so. (Which, of course, still begs the question of
    > the previous four-and-a-half years. . . .)
    >
    > I'm neither going to argue or deny your claim that your training.dat is
    > only 340 kB. It still strikes me as being suspiciously small, but it is
    > possible if you get a lot of spam that doesn't try to poison the filters
    > or that doesn't contain random strings.
    >
    > That said, you've still not provided any documentary evidence to support
    > your claim that ham doesn't need to be marked as such in order for JMC
    > to function well. . . .
    >
    > /b.
    >


    Nope, I have never intentionaly reset my training.dat. It was zero when
    I installed Thunderbird on this machine.. it wasnt even there.

    And it's 340KB right now.

    What sort of documentation do you want? My catch ratio is 99.8% - I get
    no false positives - my training.dat is 340KB.

  20. Re: Junk Mail Controls stopped working, mostly

    Moz Champion (Dan) wrote:
    > Andrew W Applegarth wrote:
    >
    >> Dan,
    >>
    >> I have a quick question. Do you use the "Do not mark messages as
    >> junk mail if the sender is in my address book:" option in JMC?
    >>
    >> - Andrew W Applegarth

    >
    > Yes


    I also have that option enabled.

+ Reply to Thread
Page 3 of 6 FirstFirst 1 2 3 4 5 ... LastLast