This is a discussion on Re: PDFText Plugin for PDF file scoring - not for PDF images - SpamAssassin ; James MacLean wrote: > Hi folks, > > Regrets if this is the wrong list. > > Wanted to be able to score on text found in PDF files. Did not see any > obvious route, so made a plugin ...
James MacLean wrote:
> Hi folks,
> Regrets if this is the wrong list.
> Wanted to be able to score on text found in PDF files. Did not see any
> obvious route, so made a plugin that calls XPDF's pdfinfo and
> pdftotext to get the text that is then scored.
> Sample local.cf could be :
> pdftotext_cmd /usr/local/bin/pdftotext
> pdfinfo_cmd /usr/local/bin/pdfinfo
> body PDF_TO_TEXT
> eval:check_pdftext("^Error","sex","drugs",'Title:\s+stock_tmp.pdf:4','Creator:\s+OpenOffice .org
> Notice that a :4 gives a find of that regex 4 points.
> Really don't know if this was the right road to follow, as I copied
> the AntiVirus.pm and came up with this:
> So far... it appears to work as expected and didn't take down a pretty
> busy server .
> Enjoy hearing any positive criticisms .
I did this the other day with CAM::PDF, but Theo recommended this work
should be done in the post_message_parse() plugin call. Then you could
just write body rules against the text, uris would get checked by
uribldns plugin, etc....