How To

  Homes arrow How To arrow So What Makes a Good Spam Filter Anyway?
 Webmaster Tools
 
Base64 Encoding 
Browser Settings 
CSS Coder 
CSS Navigation Menu 
Datetime Converter 
DHTML Tooltip 
Dig Utility 
DNS Utility 
Dropdown Menu 
Fetch Content 
Fetch Header 
Floating Layer 
htaccess Generator 
HTML to PHP 
HTML Encoder 
HTML Entities 
IP Convert 
Meta Tags 
Password Encryption
 
Password Strength
 
Pattern Extractor 
Ping Utility 
Pop-Up Window 
Regex Extractor 
Regex Match 
Scrollbar Color 
Source Viewer 
Syntax Highlighting 
URL Encoding 
Web Safe Colors 
Forums Sitemap 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
HOW TO

So What Makes a Good Spam Filter Anyway?
By: Developer Shed
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating:  stars stars stars stars stars / 0
    2004-11-03

    Table of Contents:

    Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     

    SEARCH DEV MECHANIC

    TOOLS YOU CAN USE

    advertisement

    So What Makes a Good Spam Filter Anyway?
    by Alan Hearnshaw

    Spam Filters. Most of us knowwe need one. Some of us know we need a better one, but how many stop to think whatactually makes a good spam filter in the first place?

    This is not just arhetorical question. It is a question that many users – and many developers - donot ask, and consequently, it largely remains unanswered.

    Maybe this couldbe better answered by defining here the qualities of the perfect spam filter.We’ll call our perfect spam filter the “SpamSplatter 3000”. Here are some ofthe defining qualities of “SpamSplatter 3000”

    1. It requires zero interaction from the user.
    2. It produces zero false positives (good messages identified as bad) and zero false negatives (bad messages identified as good).
    3. It is transparent – that is, you only ever see good messages and never need even be aware that spam exists.

    That’s it. Notmuch of a shopping list is it?

    Of course,“SpamSplatter 3000” hasn’t been invented yet (and if it does, I want a piece ofthe action), but it does give us a frame of reference when looking for the bestfilter we can find.

    Let’s take eachpoint in turn:

    It requires zerointeraction from the user

    There are twokinds of filters that come near to this ideal currently: Bayesian Filters and CommunityFilters.

    Bayesian filters strip messages down to small “word bites”, or tokens and maintain a database containing listsof good and bad tokens. When a new message is encountered, the filter stripsthis message down to tokens, compares it to the database, and applies a formulabased on the British scientist Alan Bayes’ formula for probability calculation.
    Over time, the Bayesian filter “learns” the characteristics of spam messages.

    Community Filters simply work on a voting system wherebyevery user that receives a spam message “votes” it as spam. This information isstored on a central server and when enough votes are received the message isbanned from all users in the community.

    As can be seen,the user interaction from these types of filters is mainly limited to two buttonoperation – correcting wrongly identified messages – and the more accurate thefilter, the less those buttons are used.

    OK, so that’spretty good. Not exactly zero interaction, but if the filter is accurateenough, then it should be pretty near. That brings us to point two:


    It produces zero falsepositives or negatives

    This is the areain which most spam filter development is concentrating and things are gettingpretty good nowadays. It is not at all unusual to see an efficient modernfilter achieve accuracy of 96% or better. It is, of course, far better to havea false negative than a false positive if you are ever going to tear yourselfaway from the killed mail folder!

    Of course, bydefinition, community filters cannot reach 100% accuracy as someone has to begetting the spam to be voting it as such!

    Theoretically, aBayesian filter may be able to eventuallyget quite close to 100% accuracy, so at least there is hope there.

    Content basedfilters (those that look for certain words, phrases or other indicators in amessage to identify it as spam), will almost certainly not get much higheraccuracy figures than the best of them can achieve today. Adapting to changing spamrequires new filters to be created on an ongoing basis.

    And finally, wecome to the holy grail of spam filtering:

    It is transparent

    Strangely enough,not enough work seems to be done in trying to achieve this goal. Some of thebest filters on the market today identify spam with impressive accuracy andthen simply place them in a “killed mail” folder for your later perusal.

    Now, forgive me ifI’m missing something here, but isn’t the point to save you having to wadethrough the junk mail? Isn’t that what you bought the filter for? With the“SpamSplatter 3000”, you don’t need to do that.

    As we haven’tachieved 100% accuracy yet (and probably never will), the only way to free usfrom checking the killed mail folder is a challenge/response system. This iswhere a message is automatically sent back to the sender requiring them to takesome action for their message to actually be delivered.

    Some systems tendto go overboard with the challenge/response system. These systems - oftencalled “Whitelist” systems - block messages from anyone that isn’t in theuser’s friends list. Guaranteed 100% effective, but too drastic a measure formost users.

    Now, it seems thatthe most intelligent use of this system would be to send challenges only tomessages that were flagged as “questionable”. Good message can be delivered,definite spam can be deleted and questionable ones would earn themselves achallenge message.

    So, to sum up,let’s rewrite the qualities of our perfect filter and get a shopping list ofwhat to look for while we wait for the “SpamSplatter 3000” to arrive:

    1. Simple, minimal setup and maintenance.
    2. Extremely low rate of false positives and as few false negatives as possible.
    3. A transparent “fail-safe” mechanism whereby the victims of those false positives can force the message through to you.

    It’s simplereally. Now, who’s going to build me this “SpamSplatter3000”…?

    About The Author

    Alan Hearnshaw isthe owner of http://www.WhichSpamFilter.com,a site which provides weekly in-depth spam filter reviews, anti-spam help andguidance, user ratings and a community forum.

    alan@whichspamfilter.com


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    More How To Articles
    More By Developer Shed

       

    HOW TO ARTICLES

    - Traffic Down One Month? Don`t Panic
    - How to Handle Fake Reviews
    - Facebook Game Update Tweaks
    - Facebook Profile Tweaks
    - How To Download Your Facebook Profile
    - Facebook Tips for Hiding Your Friends List
    - Facebook Tips to Avoid Unwanted Friend Reque...
    - Blog Contests: Do it Right
    - Simple Technique for Memorable Headlines
    - Understanding Your Analytics Results
    - Your Guide to Creating Quality Back Links
    - Getting Your True Ranking: Going Beyond Goog...
    - Optimizing for Google
    - The Right Way to Build Reciprocal Links
    - Monetization: How Not to Put Multiple Ad Ven...

    Developer Shed Affiliates

     



    © 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap