Gioffre Consulting

Home  -   Email
  Consulting | Training | Products | Articles | Programming | Contact  
The GroupWise Experts!  
 
Links
Consulting
Training
Products
Articles
Programming
 
Info
 
SPAM: The Problem and How to Combat It
Copyright Notice:  This article is Copyright 2005 by Gioffre Consulting.  It may not be reproduced, copied, distributed, or posted on other web sites without the express written permission of Gioffre Consulting.

Are you interested in Advanced GroupWise Administration training?
Are you interested in implementing Anti-SPAM solutions?

Do you like a fun and interesting training environment?

Then join us for our 5th Annual Novell Training Cruise!

You get top-notch technical training and a trip to the Caribbean!

Full Details are available at http://www.gioffre.com/upcomingtrainingcruises.html


SPAM: The Problem and How to Combat It
by Frank Gioffre
17-Sep-2005

About the Author:
Frank Gioffre is an independent consultant who runs Gioffre Consulting. He plans, configures, designs, and troubleshoots systems for clients throughout the U.S. Frank focuses on Novell products (NetWare, eDirectory, ZENworks, and especially GroupWise). His certifications include Master CNE, Master CNI, and CDE.

The Problem

SPAM has quickly become the number one problem facing network administrators today.  The amount of SPAM received continues to grow exponentially while federal and state governments try to come up with a legislative solution to the issue.  The CAN-SPAM Act of 2003 has been ineffective, as the nature of SPAM and the global Internet infrastructure make enforcement of these laws very difficult.  The other problem with the CAN-SPAM act is that it does not prohibit SPAM itself, but only prohibits the use of misleading tactics, inaccurate subject lines, unauthorized relaying, and fake source addresses.  This means that even if we achieved 100% compliance with this law, you still could and would receive SPAM.  You can read the entire text of the CAN-SPAM act (PDF file) here:
Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (Public Law 108–187—DEC. 16, 2003).

In November 2004, the nation’s first felony SPAM conviction was handed down in Virginia.  This is a promising result, but appeals are underway and we do not yet know how this will affect further litigation.  Once again, even successful litigation will not substantially curb the flow of SPAM because:

  • For every conviction, there are thousands of spammers that are not caught.

  • Spammers can set up shop outside of the United States.

  • So-called "legitimate" marketing E-Mail is not affected by this law.

The estimates of SPAM volume in the United States vary greatly, but no one argues that it is a major problem.  Recent studies show that as much as 90% of all E-Mail messages traversing the Internet are classified as SPAM.  This means that not only is SPAM a nuisance, but that it also impedes the functionality of your E-Mail systems and costs money to handle.  Space is used on E-Mail servers, processor time is consumed handling the great volume of SPAM, administrator time is taken up dealing with SPAM problems, and employee performance is eroded by sorting and deleting unwanted messages.

This article is generic in nature, meaning that it covers the SPAM problem without regard for any particular E-Mail system.  However, I deal mostly with GroupWise and I will mention some specifics as to how these issues pertain directly to GroupWise.

The Solution

Legislation is not an effective deterrent, and manual sorting of messages is impractical, therefore the solution must be found in automated sorting, classification, and deletion of SPAM through the use of E-Mail scanning software.

This sort of software has been in existence for several years now, with different packages using various methods to classify messages as either SPAM or legitimate E-Mail.  The concept behind this software is simple: electronically read the message and check for key words and characteristics that uniquely identify a message as SPAM.  The implementation of this task however is anything but simple.  Spammers continually modify the content and format of their SPAM messages in order to avoid detection by these software packages.  The end result is a never-ending cat and mouse game between the software vendors and the spammers.

The current incarnation of craftily designed SPAM messages include obfuscation techniques such as:

  • Random text added to messages to evade keyword scanners.
  • Spaces and other characters mixed in to words to evade keyword scanners.
  • False header and source information to evade server blacklists.
  • Message with just a URL link to the spammer website.
  • 100% graphic message with no keywords to scan.

Hopefully we hold the upper hand since no matter how crafty the spammer is, all SPAM has one thing in common.  That common thread is that all SPAM is trying to either sell you something or get you to visit a web site for some other reason.  Therefore, we are starting to see more scanning techniques focused not on where the message is coming from, nor what the content is, but rather where the message is trying to lead you.

The most effective anti-SPAM software packages must use a combination of techniques to classify SPAM.  Furthermore, the scoring algorithms must be stringent enough to catch a high percentage of SPAM without a high percentage of false positives.  False positives are legitimate E-Mail messages (otherwise known as HAM), that are incorrectly classified as SPAM.  This is a major problem, as discarding legitimate E-Mail messages can cause great damage to any company.  The software implementation must walk a fine line in order to properly classify all messages, and generally it should lean towards allowing more SPAM rather than discarding HAM.

The last consideration is that of tailoring the classification engine to your needs.  One man's SPAM may be another man's HAM (never thought I'd see that sentence in print), so the software must have the ability to be fine-tuned based on the company or even the individual involved.  As an example, one big SPAM category is that of "low mortgage rates".  You may not be in the market for a new home so you want all messages with keywords like "mortgage" and "low rates" discarded.  However, if you are a loan officer at a bank then that rule would obviously be bad for business.

The Implementation

Anti-SPAM software has made great advances over the past few years.  Current classification techniques allow for very high identification rates with correspondingly low false positive rates.  Let's look at how most Anti-SPAM software classifies SPAM and then the various techniques used in the classification process.

Rating the Messages

As good as any software is at classifying SPAM, there are no absolute ratings of SPAM versus HAM.  Rather, we use a scale to give each message a SPAM score or SPAM rating. The process involves 3 distinct steps:

  • Step 1: Apply the Rating Rules and Determine the SPAM Score
    The score assigned to each message is determined by cycling through all the various rating methods and rules applied by the software.  Each "hit" against a particular rule, adds an incremental amount the total score.  The cumulative total of all the rule hits, gives us the total SPAM score for that message.
  • Step 2: Compare SPAM Score to Cut Score
    Once the score is determined, it is compared against a cut score.  That cut score is usually set by the system administrator and determines which messages are classified as SPAM.  Any message with a SPAM score above the cut score is determined to be SPAM.
  • Step 3: Process the SPAM
    Once a message has been determined to be SPAM, it must be processed accordingly.  There are many different techniques that can be followed.  One technique is to insert a unique keyword (like S-P-A-M) in the subject line so that the client E-Mail software can process the message with a rule (delete or move).  Other possibilities include moving the message to a quarantine area, forwarding it to a special SPAM account, or deleting it altogether.  More advanced software can perform different actions based on multiple cut scores.

The benefit of this method is that no one rule can force a message to be rated as SPAM.  It takes many SPAM-like traits in combination to classify a message as SPAM.  This is how you avoid false positives.  As an example, let's say I received a message from a client asking me what the airfare would be for me to visit them.  I don't want the word "airfare" by itself to trigger the SPAM alert.  However, I do want the unsolicited "low airfare" messages to go straight to the bit-bucket.  Here's a real-life sample of just such a message that was properly classified as SPAM:

FROM: Travel Flea Market <TravelFleaMarket@brightstuff.com>
SUBJECT: Vacation w/Air 199, Cruise 5 Nts 399 & More Inside
MESSAGE:

------------------------------------------------------------------------- brightstuff.com

You are subscribed with the email address omitted@gioffre.com. If you wish to be excluded from future offers, please use the link below:

http://web.web.brightstuff.com/r/alqtkukeauuavhmiv2eprvvb or send an email to:
unsubscribe-alqtkukeauuavhmiv2eprvvb@brightstuff.com

Email us support@brightstuff.com

-------------------------------------------------------------------------

The rest of the message was all HTML.  Note however that there are links in the message (and also in the HTML) that point back to where they wanted me to go.  The SPAM score for this message was 6.42, as determined by the accumulation of these hits:

BAYES_20: (-1.95)
HTML_IMAGE_ONLY_20: (0.45)
RCVD_IN_BL_SPAMCOP_NET: (1.22)
HTML_TEXT_AFTER_BODY: (0.06)
MULTI_REMOVAL_1WORD: (0.80)
HTML_FONT_FACE_BAD: (0.04)
URIBL_OB_SURBL: (3.21)
URIBL_SBL: (1.00)
URIBL_WS_SURBL: (1.46)
RCVD_IN_SBL: (0.11)
HTML_TEXT_AFTER_HTML: (0.03)
 

My system cut score is set at 3.4 so this message is therefore classified as SPAM.

Rating Methods

It all comes down to the rules and methods that are used to score the messages.  Let's take a look at some of the various methods available.

Keyword/Phrase Matching

With this rating method, the software compares all the words in a message against a list of "bad" words in its database, with each hit causing the SPAM score to increase.  This is one of the simplest, oldest, and least effective methods of rating SPAM.  Today's sophisticated spammers know which keywords to avoid.  Furthermore, keyword matching alone can also lead to many false positives.  In GroupWise, you can accomplish basic keyword matching by using rules to match on content and automatically delete the messages.

Keyword matching can also include FROM address matching.  This is the basic functionality that is built in to your GroupWise client Junk and Block lists.  While this may be effective in blocking "nuisance" E-Mail from real people, it does little in blocking true SPAM.

Bayesian Filtering

Bayesian filtering is an advanced form of keyword matching.  It uses more sophisticated algorithms that check for good words (from HAM) as well as bad words (from SPAM), and then calculates a probability score from 0 to 1.  A score of zero indicates that the message has a 0% probability of being SPAM while a score of 1 indicates a 100% probability of being SPAM.  In real life, you will see that Bayesian scoring typically leads to messages that are either scored very low (in the 0% to 20% range) or very high (in the 80% to 100% range) with not much in between.

The key to Bayesian filtering is a good base of keywords to compare against.  This keyword base is called the corpus.  Typically you will have a SPAM corpus and a HAM corpus.  Most software packages that use Bayesian filtering will provide you with a base corpus as well as the ability to add your own keywords to the corpus.  More advanced packages will build the corpus automatically by letting you submit messages that you personally classify as SPAM or HAM.

For some further reading on Bayesian filtering and SPAM topics, please visit Paul Graham's web site at http://www.paulgraham.com/antispam.html

BlackLists

A blacklist is simply a list of source IP addresses.  Your system will automatically refuse any E-Mail messages originating from any server on your blacklist.  This is a simple and fairly ineffective method of preventing SPAM since the sophisticated spammers will not keep sending from the same IP address for very long.  In GroupWise, your GWIA (GroupWise Internet Agent) fully supports setting up your own blacklist.

Realtime Blackhole List (RBL)

A RBL is a list of open relay servers that are being exploited by spammers for the purpose of sending out great amounts of SPAM.  The really nice feature of RBLs is that you (the system administrator) do not need to keep a RBL up to date yourself.  There are RBL services that your GroupWise system can use to look up and block open relay sources.

Over the years a few problems have popped up with RBLs.  First, there is the issue of rejecting legitimate E-Mail messages from a non-spammer because their server has been placed on a RBL.  Generally this means that the server is being used for open relay, but the real problem is the process of getting a server off the RBL.

There is an old adage that says "Absolute Power Absolutely Corrupts".  This has been proven to be true with some public RBLs available on the Internet.  The power that came with the ability to actually cripple any company by putting them on the RBL led to the demise of some very well known RBLs.  The main issue is that some RBLs were so self-righteous that they made the problem worse by refusing to remove servers from the RBL even after they were fixed.  Other RBL services added servers to their RBL simply because they did not like the way that company did business, making it more a form of censorship.

RBLs today are less popular and much less effective than they were 4 or 5 years ago.  It is estimated that only 2% of SPAM can be blocked by using RBLs alone.  Your GWIA can support RBLs by simply specifying the RBL servers.

Reverse Address Lookup

This method looks at the E-Mail header of the message arriving at your server.  The E-Mail header provides a source IP address.  The FROM address is parsed to extract the source domain.  A DNS (Domain Name Service) reverse lookup is performed on that domain name to yield the registered IP address for that domain's mail exchange (MX record) host.  The reverse lookup is then compared to the header IP address and if the two do not match, the message is considered suspect.

Some E-Mail systems (including GroupWise) have the capability to reject inbound messages based solely on this one criteria.  At first it sounds like a great method since spammers practically never use legal or valid means to send their messages.  However if you turn on this feature, you will end up tossing large amounts of legitimate messages.  With today's mobile workforce, there are many legitimate reasons why reverse lookup will not work.  As an example let's say that the fictitious person Patrick Star has a work E-Mail address of pstar@kkrab.com.  He is working from home and using his cable modem service provider's E-Mail server.  He sets up his E-Mail system at home to show a FROM address of pstar@kkrab.com.  When you receive his E-Mail message and perform a reverse lookup on the E-Mail address, it will not match the source address since the message was not sent from the office.

This method is so troublesome that I would recommend NEVER turning it on.

Rogue Methods

Certain Internet Service Providers have come up with other methods of blocking SPAM for their members.  While I truly understand the magnitude of the problem, some of these providers are going way beyond what I would call sensible.  Some of these overly stringent policies are disrupting important business and personal correspondence.  The major culprit in my opinion is AOL (America Online).

AOL uses many other procedures to filter (so-called) SPAM.  The problem is that the filters are overly strict, almost random in design, and not disclosed.  I can only venture to guess what the filters are by trial and error.

One method I have seem used is to block any E-Mail messages that have more than xx number of recipients.  The number (xx) depends on the ISP.  While true that spammers do send messages to large distribution lists, there are legitimate uses for large distribution lists.  What it you need to send out an invitation to a conference and it needs to go to 350 people?  Is that SPAM?

SPAM URI Realtime Blacklists (SURBL)

This is the most powerful new technique, and potentially the one that the spammers will have the most trouble circumventing.  The reason why it works so well is because SURBL does not care about the source address of the message.  Instead SURBL searches the message (plain text, HTML code, and graphic links) for embedded destination URIs (web page addresses).  As I mentioned before, there is one common trait that no spammer can avoid in their messages.  They need to direct you to a web site!

The question comes up as to why this would be any more difficult to circumvent than the source IP address.  Sure, web site addresses can be changed on a regular basis but that would completely destabilize the entire sales structure they are trying to promote.  The purpose of SPAM is to sell you something.  If the web site address keeps changing, the potential customers won't be able to find it and thus the SPAM becomes useless as a sales tool.  Imagine seeing a television commercial for a product that can be ordered only by phone.  If that phone number were to change after only 1 or 2 days, then the impact and benefit of the commercial is greatly diminished.

For more information on SURBL implementation, please visit http://www.surbl.org


Are you interested in Advanced GroupWise Administration training?
Are you interested in implementing Anti-SPAM solutions?

Do you like a fun and interesting training environment?

Then join us for our 5th Annual Novell Training Cruise!

You get top-notch technical training and a trip to the Caribbean!

Full Details are available at http://www.gioffre.com/upcomingtrainingcruises.html


The Products

There are many products today that fight SPAM.  The product you choose depends on the type of implementation you desire, the E-Mail system you use, the cost, and of course the capabilities of the product.

Stand-Alone Solutions

When I discuss stand-alone solutions, I am referring to any solution that is not a software package and does not integrate directly with your E-Mail system.  There are 2 solutions that fall in to this category.

The first is a stand-alone appliance or hardware device.  There are many vendors that sell these devices.  The benefits are that they examine the raw SMTP (Simple Message Transfer Protocol) and therefore work with any E-Mail system (GroupWise, SendMail, Lotus Notes, etc...).  The biggest disadvantage is the initial cost of the device itself.  I have not reviewed any of these products.

The second stand-alone solution is an 3rd party company which does all the scanning for you.  There are quite a few reputable companies in this space.  Just like appliances, they examine the raw SMTP message and work with any E-Mail system.  One of the biggest advantages of a service like this is that it almost completely eliminates any setup, configuration, and maintenance work for you.  The disadvantage is that the operation is not completely under your control.  Pricing for these services is based on the number of users and most are quite reasonable.

Software Solutions

Generic Solutions

Software solutions can either stand on their own or integrate tightly with your particular E-Mail system.  The solutions that integrate with your E-Mail system can typically exploit some of the features of that system in order to become a more complete and feature-rich solution.

No discussion of Anti-SPAM software would be complete without discussing SpamAssassin, as it is at the core of many other software solutions.  SpamAssassin is an open source project that is making great strides in the battle against SPAM.  As it is open source, many other software packages (both open source and commercial) use its code, concepts, and innovations in order to provide you with a solution that meets all your needs.

For more information on SpamAssassin, including downloads and FAQ listings, please visit http://spamassassin.apache.org

GroupWise Solutions

There are some excellent Anti-SPAM solutions designed specifically for GroupWise.  Most notable are GWAVA (http://www.gwava.com/products/gwava_overview.html) and GeeWhiz (http://www.omni-ts.com/products--gee-whiz-spam-filter-and-anti-virus.html).

Of the available products, I personally use, recommend, sell, and install GeeWhiz.  As of this writing the current available version is 1.4.10, with v2.0 entering public beta.  I have been using the v2.0 beta product for 4 months now and I am quite impressed by its capabilities and features.

GeeWhiz performs Anti-SPAM scanning as well as Anti-Virus scanning (with any Anti-Virus software package) on both GroupWise and NetMail.  Version 2.0 adds platform support so that you can run it on NetWare, Linux, and Windows servers.

In my tests, GeeWhiz v2.0 has correctly rated and classified over 98% of my SPAM with a false positive rate of less than .0025% (that's less than 1 in 40,000).  You may think that even 1 false positive is too many, but consider that a human being will make errors at a higher percentage rate than this software.

I have just finished a full review of this product, which is available in the spring edition (2005) of GroupWise Advisor magazine (http://gwadvisor.com).

Summary

SPAM has truly become the pariah of the E-Mail world.  The spammers and the system administrators have been playing this complex chess game of one trying to outsmart the other.  Legislation, litigation, and other strong-arm methods have not worked.  It is up to the latest breed of Anti-SPAM software to quash this torrent of useless, time-wasting, annoying, and often vulgar messages.  With the latest methods (including SURBL), it is our hope that we can finally win this battle, and it looks like we finally can!


Copyright Notice:  This article is Copyright 2005 by Gioffre Consulting.  It may not be reproduced, copied, distributed, or posted on other web sites without the express written permission of Gioffre Consulting.

 
Specials
 
 
Consulting Services | Training | Products | Articles | Programming | Contact
Copyright 2005 Gioffre Consulting    Home    -   Email