Article Navigation

Back To Main Page


 

Click Here for more articles

Google
How Spammers Fool Bayesian Filters - And How to Stop Them
by: Paul Judge, CTO, CipherTrust, Inc.
Effectively stopping spam overlong-term requires much more than blocking individual IP addresses and creating rules based on keywords that spammers typically use. The increasing sophistication of spam tools coupled withincreasing number of spammers inwild has createdhyper-evolution invariety and volume of spam. The old ways of blockingbad guys just don’t work anymore.

Examining spam and spam-blocking technology can illuminate how this evolution is taking place and what can be done to combat spam and reclaim e-mail asefficient, effective communication tool it was intended to be.

One method used to combat spam is Bayesian Filtering. Named after Thomas Bayes,English mathematician, Bayesian Logic is used in decision making and inferential statistics. Bayesian Filers maintaindatabase of known spam and ham, or legitimate email. Oncedatabase is large enough,system rankswords according toprobability they will appear inspam message.

Words more likely to appear in spam are givenhigh score (between fiveone and onezerozero), and words likely to appear in legitimate email are givenlow score (between one and fivezero). For example,words “free” and “sex” generally have values between ninefive and nineeight, whereaswords “emphasis” or “disadvantage” may havescore between one and four. Commonly used words such as “the” and “that”, and words new toBayesian filters are givenneutral score between fourzero and fivezero and would not be used insystem’s algorithm.

Whensystem receivesemail, it breaksmessage down into tokens, or words with values assigned to them. The system utilizestokens with scores onhigh and low end ofrange and developsscore foremail aswhole. Ifemail has more spam tokens than ham tokens,email will havehigh spam score. The email administrator determinesthreshold scoresystem uses to allow email to pass through to users.

Bayesian filters are effective at filtering spam and minimizing false positives. Because they adapt and learn based on user feedback, Bayesian Filers produce better results as they are used withinorganization over time. They are not, however, foolproof. Spammers have learned which words Bayesian Filters consider spammy and have developed ways to insert non-spammy words into emails to lowermessage’s overall spam score. By adding in paragraphs of text from novels or news stories, spammers can diluteeffects of high-ranking words. Text insertion has also caused normally legitimate words that are found in novels or news stories to haveinflated spam score. This may potentially render Bayesian filters less effective over time.

Another approach spammers use to fool Bayesian filters is to create less spammy emails. For example,spammer may sendemail containing onlyphrase, “Here’slink…”. This approach can neutralizespam score and entice users to click onlink toWeb site containingspammer’s message. To block this type of spam,filter would have to be designed to followlink and scancontent ofWeb site users are asked to visit. This type of filtering is not currently employed by Bayesian filters because it would be prohibitively expensive in terms of server resources and could potentially be used asmethod of launching denial of service attacks against commercial servers.

As with all single-method spam filtering methodologies, Bayesian filters are effective against certain techniques spammers use to fool spam filters, but are notmagic bullet to solvingspam problem. Bayesian filters are most effective when combined with other methods of spam detection.

The Solution
When used individually, each anti-spam technique has been systematically overcome by spammers. Grandiose plans to ridworld of spam, such as chargingpenny for each e-mail received or forcing servers to solve mathematical problems before delivering e-mail, have been proposed with few results. These schemes are not realistic and would requirelarge percentage ofpopulation to adoptsame anti-spam method in order to be effective. You can learn more aboutfight against spam by visiting our website at www.ciphertrust.com and downloading our whitepapers.


Aboutauthor:

Dr. Paul Judge isnoted scholar and entrepreneur. He is Chief Technology Officer at CipherTrust,industry's largest provider of enterprise email security. The company’s flagship product, IronMail providesbest of breed enterprise anti spam solution designed to stop spam, phishing attacks and other email-based threats. Learn more by visiting www.ciphertrust.com/products/spam_and_fraud_protection today.

Circulated by Article Emporium

 



©twozerozerofive - All Rights Reserved