Error while sa-learning - SpamAssassin

This is a discussion on Error while sa-learning - SpamAssassin ; (again as new mail) Hey list, I get lots of these errors while passing a mbox file to sa-learn for spam learning: Malformed UTF-8 character (unexpected non-continuation byte 0x72, immediately after start byte 0xf3) in transliteration (tr///) at /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: Error while sa-learning

  1. Error while sa-learning

    (again as new mail)
    Hey list,

    I get lots of these errors while passing a mbox file to sa-learn for
    spam learning:

    Malformed UTF-8 character (unexpected non-continuation byte 0x72,
    immediately after start byte 0xf3) in transliteration (tr///) at
    /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1049.
    Malformed UTF-8 character (unexpected non-continuation byte 0x20,
    immediately after start byte 0xe1) in transliteration (tr///) at
    /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1050.

    with variations in non-continuation byte and start byte, but all in
    lines 1049 and 1059 of Message.pm
    The process finishes well and tokens are learned, so I assume it's some
    of the messages within the mbox file that are somehow corrupted.
    It started today after I added a bunch of new spammy msgs I collected.
    What does the error mean and how can I identify the mails with the problem?

    Regards
    /Diego


  2. Re: Error while sa-learning

    Diego Pomatta wrote:
    > (again as new mail)
    > Hey list,
    >
    > I get lots of these errors while passing a mbox file to sa-learn for
    > spam learning:
    >
    > Malformed UTF-8 character (unexpected non-continuation byte 0x72,
    > immediately after start byte 0xf3) in transliteration (tr///) at
    > /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1049.
    > Malformed UTF-8 character (unexpected non-continuation byte 0x20,
    > immediately after start byte 0xe1) in transliteration (tr///) at
    > /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1050.
    >
    > with variations in non-continuation byte and start byte, but all in
    > lines 1049 and 1059 of Message.pm
    > The process finishes well and tokens are learned, so I assume it's
    > some of the messages within the mbox file that are somehow corrupted.
    > It started today after I added a bunch of new spammy msgs I collected.
    > What does the error mean and how can I identify the mails with the
    > problem?

    What perl version are you running? I suspect this appears to be related
    to a common bug in perl 5.8.6

    It can be kludged with a "use bytes" added to message.pm, but that hurts
    performance a bit.

    See also:
    https://issues.apache.org/SpamAssass...ug.cgi?id=3787

    (note: that bug is actually about it cropping up in rules, but it is
    likely the same root cause unless you're running perl 5.8.8)

    >
    >



+ Reply to Thread