Dave Trudgian wrote:
Simon Waters wrote:Wetware rules!Nah we are only good at spotting the obvious spams, the computers do that much quicker anyway. I suspect we are only doing better as the spam filters tend to ignore what they don't understand, where as we count some of it against it.There is a commonly held view that any automatic spam filter shouldn't produce any false positives. This generally prevents people implementing spam filters than generalise a large amount. They are much better at classifying spam, but could misclassify some ham.
Misclassification isn't as big a problem if as a result you issue a 5xx error. I think the problem is we've tended to "post filter" after MTA rather than before. Although if you have back-up MXs, or otherwise destroy the point to point model, you could misclassify the bounce of a misclassified genuine message.
A recent MIT spam conference discussed this. Those present seemed to think that we needed to change are ideas about what is acceptable performance from spam filters. I'm sceptical whether people will start looking at false positives as acceptable in order to get a spam filter that generalises well. We shall see how things develop.
Having gone with TMDA - I have false positives, but they are largely machine generated emails - not all false positives are equal.
For example quite a lot of the spam getting past spam assassin deliberately misspells all the obvious keywords - well I spot "Vaigra" and hit delete. Since quick and effective spell checking tools exist, I dare say this is a class of spam we could kill if anyone cared enough to code it.Perfectly feasible stuff, I looked at doing this in fact. The trouble is getting the balance right between catching a few more spam mails and taking longer to work out what features to classify on.
Err on the side of better classification, computationally efficient is good, but it needs to be effective first.
A quote from Paul Graham's "A Plan for Spam": "The Achilles heel of the spammers is their message. They can circumvent any other barrier you set up. They have so far, at least. But they have to deliver their message, whatever it is. If we can write software that recognises their messages, there is no way they can get around that."
Too simple - the message is too easy to hide from the machines.
If you can come up with something that filters well on content then the message doesn't get through.
This isn't my experience.
Description: OpenPGP digital signature