This paper
looks at the major spam filtering techniques in current use. In looking at
methods both success rates and possible problems with each method are
explored. Methods discussed include key word filtering, open relay filtering,
open proxy filtering, dial-up filtering, non conforming mailing list
filtering, cooperative sharing of spam samples, known spam origin filtering
Bayesian filtering, Markovian discrimination, gray listing and challenge
response.
Abstract
This paper
looks at the major spam filtering techniques in current use. In looking at
methods both success rates and possible problems with each method are
explored. Methods discussed include key word filtering, open relay filtering,
open proxy filtering, dial-up filtering, non conforming mailing list
filtering, cooperative sharing of spam samples, known spam origin filtering
Bayesian filtering, Markovian discrimination, gray listing and challenge
response.
The Problem
The first ever spam message was sent on March 5, 1994(Moody, 2004). In
the last 11 years spam has expanded to comprise approximately 65% of all e-mail
(“Filtering Technologies in Symantec Brightmail AntiSpam
6.0,†2004).
As spam becomes more prevalent it threatens to make e-mail unusable. With
this in mind, I will review several different approaches to spam filtering.
Special attention will be paid to how these different types of filters operate,
how they collect data and problems that the filters themselves can present.
In the last 11 years spam has expanded in magnitude from a problem that
shocked Usenet users who could not believe that someone would be so crass as to
advertise on the Internet but didn’t hinder normal Internet communications to
a problem significantly pervasive that national governments are trying to find a
way to stop, or at least limit, the amount of spam received by Internet users.
To get an idea of why spam is so despised by average e-mail users and systems
administrators alike you must look at the amount of spam that is sent on a daily
basis. Every day AOL filters 2.4 billion spam messages. That translates to
blocking 70 e-mails per user per day (Vaughan-Nichols, 2003).
As an example of how bad things can easily get if spam is not curtailed
consider, there are 24 million small businesses in the United States. If 1% of
these companies got your e-mail address and send one message per year you would
have an increase of 657 extra e-mails every day (Schwartz, 2003).
Beyond the annoyance factor, there is a cost to the spam recipient. This
cost can be either in lost productivity or the monetary cost of filtering spam.
Assuming that an employee can accurately delete all spam in thirty seconds per
day a company with 10,000 employees can expect to spend $675,000 per year on
spam deletion (“The State of
Spam,†2003). Home users do not get off without
a high monetary cost. AOL reports that they spend 15% of their users’ monthly
fees on fighting spam and responding to complaints (Gaspar &
Gaudin, 2001).
It is obvious that spam cost a substantial amount of money for the
recipient yet the cost to the sender is minimal. May anti-spam advocates go so
far as to say that spam is the equivalent of postage due advertising since the
largest part of the cost is born by the recipient not the sender.
One final consideration for why e-mail is a problem is that much of what
is sold is offensive or fraudulent. While there has not yet been a reported case
of a company being sued because an employee received offensive spam e-mail many
human resource managers worry that this could happen. Since people sending spam
e-mail know nothing about their recipients it is not uncommon for children to be
the recipient of sexually explicit e-mail. This is obviously a concern for many
parents. Finally, much spam advertises fraudulent merchandise. According to the
Federal Trade Commission two-thirds of all spam contains deceptive or false text
(Cox & Dyrness, 2003).
Filtering Techniques
Considering the problems with spam it is not surprising that numerous
different techniques have been developed to automate the filtering and deletion
of spam. These techniques each have their relative strengths and weaknesses.
Static Black Lists
The oldest method of filtering spam is to use a blacklist. Blacklists are
static lists made up of people, words or groups that have a high probability of
being spam. At the simplest level a blacklist can be a list of specific e-mail
addresses set up in an end user’s mail program.
Word Lists
The simplest form of blacklists is the word list. The idea is that
certain words should never show up in legitimate e-mail so any e-mail that
contains one of those words must be spam. This type of filter is typically
deployed at the single user or at most the single domain level. The choice of
words and phrases is extremely important in this type of filtering because
almost any word can conceivable eventually end up in a legitimate e-mail. In my
experience, the most effective methods of using key word filters is filtering on
domain names, e-mail addresses and carefully selected phrases found in existing
spam. Due to the high amount of precision that must be exercised in creating
rules, this type of filter has a tendency to have relatively high levels of
false positives. One reason for the
high false positive rate is that a single use of a “bad†word can get a
message that otherwise looks completely innocent blocked. Static key word
filters also require an extensive amount of upkeep to add new spam words. Spam
techniques and products change at a rapid rate necessitating an equally rapid
change in filtered words. A static keyword filter that is reasonably successful
will within a few months become nearly useless if the key words are not
continually updated.
Open Relays
In the early Internet it was not uncommon for e-mail administrators to
allow anyone to send e-mail to anyone else regardless of whether either person
had an account on the server that was relaying the message. This behavior is the
definition of an open relay (“Open Relay Database
FAQ,†2004). Some of the
least scrupulous senders of spam use open relays as a way to hide their tracks
and offload most of the already low cost of sending their messages to a third
party. Spam operators that try to maintain a façade of legitimacy typically
avoid using open relays. There are a couple of reasons for this. The use of an
open relay destroys any hope of seeming legitimate and secondly it’s hard to
claim that use of an open relay is not criminal computer trespass.
Blocking of open relays has certain advantages and disadvantages.
Blocking open relays will cut down the amount of spam received proportionally to
the amount of spam that is funneled through vulnerable systems. Unfortunately
some legitimate e-mail may also be blocked if a legitimate correspondent is
using an Internet Service Provider (ISP) or corporate mail server that has not
been properly secured. In today’s environment responsible system
administrators are very quick to fix any misconfiguration that might leave their
servers exposed as an open relay therefore the amount of legitimate e-mail
blocked should be minimal.
Open Proxies
Open proxy blacklists are somewhat similar to open relay blacklists in
that they try to stop spam operators that target misconfigured servers. An open
proxy allows a spammer to send e-mail through a mail server that they would
typically not have access to by making them appear to the mail server as if they
were a local user (Farmer, 2003). Open proxy blacklists have similar advantages
and disadvantages to open relay filtering.
Dial-up Blacklists
Dial-up blacklists are lists that are designed to block any traffic that
comes from a network address that corresponds to a consumer oriented ISP. These
may be actual dial up accounts or high speed Internet accounts. The idea behind
this type of list is that people in these networks should not be sending e-mail
directly to other e-mail server. All e-mail should be sent through their ISP’s
e-mail server. Therefore there should not be any harm is blocking e-mail traffic
from the portions of these networks assigned to end users. A great deal of spam
has been sent using consumer ISP services through the years, so this does seem
like a logical approach. Some of these messages are sent when spam mailing
companies sign up for “throw away†Internet accounts. A relatively recent
twist in the spam story is that some spam mailing companies have begun to hire
virus writers to create viruses that allow them to send e-mail through infected
home computers that act as either open relays or open proxies (Leyden, 2004).
These infected computers are another reason for spam to come from these parts of
the Internet that should not typically contain servers.
Consumer oriented ISPs have been estimated to account for between 30% and
80% of all spam being sent today (Bray, 2004).
This makes it fairly obvious that a large proportion of spam can be
stopped by simply blocking anything that comes from a consumer ISP. The major
problem with these lists is that some small companies and home computer
enthusiasts may operate their own mail servers but not be large enough or well
funded enough to be able to purchase ISP service from a company that is not
listed on these lists. There is a simple solution to this problem. Individuals
or companies using services that are predominately consumer oriented should
simply relay all e-mail through their ISP’s mail server. Since some people do
not understand the problem of operating mail servers on these networks there is
a false positive issue that must be considered.
Non Confirmed Mailing Lists
Some mailing lists on the Internet do not confirm the legitimacy of new
subscriptions. These are typically referred to as single opt-in or non
confirming mailing lists. Non
confirmed mailing list signups can be abused by unscrupulous mailing companies
who will add people to mailing lists and then claim they signed up. These lists
can also be abused by malicious individuals who subscribe a target to numerous
lists as an annoyance. Some black hole list operators consider these mailing
lists spam regardless of whether there have been complaints or not (“Detailed
End User Information for MAPS NML Listings,†2004). These blacklist operators
advocate double opt-in or confirmed mailing lists. The difference being that in
a double opt-in list the person subscribes and then receives a message that they
must respond to confirming that they really want to subscribe to the mailing
list.
Most, but not all, companies that operate legitimate mailing list have
moved to double opt-in as an effort to stay off of blacklists. The disadvantage
of using this type of black hole list is that there may be some legitimate
mailing list e-mails that get dropped in the process of filtering out the spam.
As a general rule a false positive on a mailing list is considered less serious
that a false positive on a personal e-mail but they still can be a problem.
Cooperative Spam Signatures
A method of filtering spam that is beginning to pick up popularity is
cooperative sharing of spam signatures. This technique is similar to the method
used by virus scanners in that a sample of a spam message is used to create a
hash of the message. Unlike virus scanners the hash creation is automated as
opposed to being a task undertaken by a human. Also unlike virus scanners all or
most of the message is used for hash creation while virus scanners typically
rely on finding unique signatures within virus programs. After a sufficient
number of people report the message as spam future recipients of the message
will be able to automatically filter the message (Mertz, 2002).
This method is by definition more reactive than some of the other systems
for spam filtering in that it relies on several people receiving and reporting
the same piece of spam before it will be filtered. There is a similar problem
inherent in signature based virus scanners in that they can not stop a new piece
of malicious software until they have seen samples to create signatures from.
Many spam mailers will use hash busters that make each message statistically
unique therefore creating a different hash. The cooperative spam lists all
attempt to minimize the effect of has busters by using only certain parts of the
message that are less likely to contain hash busters.
In theory there should be a near
zero false positive rate because e-mail must be reported by multiple people and
your legitimate e-mail should be impossible to report since only you receive it.
False positives can slip into the system in three ways. People forget that they
are subscribed to mailing lists and report them as spam. Secondly some current
implementations of this method allow system administrators to configure other
spam filters to send a copy of any e-mail that appears to be spam to the central
signature server. This means that if a mailing list gets incorrectly identified
by other filters it may be reported to the central server as well. Finally it is
possible, although highly unlikely, that a legitimate e-mail and a spam e-mail
could end up with the same hash if the hashing algorithm creates hashes that are
not perfectly unique. In researching this I did not find any examples of this
type of theoretical false positive. False positives should fall into the
category of mailing lists meaning that while these false positives are
problematic they are less of a problem than false positives on personal
correspondence.
Known Spam Origin
The final type of static list I will discuss is the known spam origin
blacklist. These are lists that are comprised of email originating from IPs that
have previously sent spam either to a user of the system or to a decoy address (Spews.org
FAQ, 2004).
The major problem with the spam origin lists is that they are not
particularly effective and have one of the highest false positive rates of any
spam filtering technique. As an example, in research completed by Giga
Information Group the black list provider Mail Abuse Prevention Systems, LLC
(MAPS) was found to successfully block only 24% of spam but more worrying there
was a 34% false positive rate (Gaspar &
Gaudin, 2001).
One of the reasons for the high level of false positives by MAPS and some
other known spam origin lists is that a vigilante mentality can grow in the
groups that operate the lists. One common approach taken by these groups is to
block “spam support†organizations. What is often means in implementation is
blocking an entire ISP’s network space if they cannot get the ISP to drop a
single spammer.
The policy of intentionally blocking innocent customers that happen to
share networks space with a spammer is called overblocking. As an example of how
extreme the overblocking can be, in February of 2002 Spam Prevention Early
Warning System (SPEWS) added all of Interland’s 400,000 customers to their
back list because Interland had not removed 100 customers that SPEWS accused of
spamming (Wagner, 2002, May 23).
These techniques are effective.
Many large ISPs have caved under the pressure of having their legitimate
customers blocked because they were allowing a few spammers to operate using
their network. While overblocking is effective for convincing ISPs to remove
known spam operators from their network it also leads to very high false
positive rates making these services unusable for anyone who considers false
positive results to be a problem.
Statistical Filters
There are a few different statistical models that have been discussed in
the academic literature but few methods are currently in production products
that identify their filtering method. I will discuss Bayesian filtering and
Markovian discriminators. It is possible that there are other statistical models
that are in use in proprietary closed systems but since they are by definition
closed it is impossible to consider them independently.
Bayesian filtering
Bayesian filtering is based on Bayes’ Theorem. The common
implementation assumes that all words in a given message are not related thus,
the filter in intentionally naïve and is referred to as naïve Bayesian
filtering. A corpus of both spam and legitimate e-mail, referred to as ham, is
collected to base filtering on. The filter looks at each word in a message and
by comparing the probability of that word being in a spam or a ham messages
gives it a score. When looking at new messages the filter will take scores for
words from the message that have the highest probability of being either spam or
ham words and gives the message a score indicating that the message is either
ham or spam.
Properly trained naïve Bayesian filters have reported very high
filtering rates with some of the lowest false positive rates seen in any spam
filtering methods. One technique that is used to reduce the number of false
positive results is the doubling of non-spam words. This means that a word found
in a non-spam message is twice as important as the same word found in a spam
message. This helps to bias the filter toward slightly high false negatives but
substantially lower false positives. False positives being a more significant
problem this is a logical tradeoff.
One crucial issue for Bayesian filtering is the training of
the filter. The more e-mail the filters sees the more accurate the assumptions
about words will become. The major weakness for Bayesian filtering is that it is
ideally used at the individual user lever instead of at the mail gateway level.
Essentially, the filter is more capable of learning the quirks of a given users
good and bad words than it is of learning numerous users good and bad words
since different people will have different requirements for what needs to make
it through the filter. Even though this is the case several products do
successfully implement naïve Bayesian filtering at the gateway level even
though the success rates do take a hit (Graham, 2003). The more similar the
group being filtered the more likely that naïve Bayesian filters will have
results similar to those of a single user. As an example, a group of doctors will be more likely to
receive drug names in their regular e-mail than the population as a whole
therefore if those doctors are grouped together the false positive rate at least
for those typically highly spam indicative words will remain low but if you
group those same doctors with the population as a whole you will see a rise in
the doctor’s false positive rate since most of the population as a whole does
not receive a great deal of legitimate e-mail with large numbers of drug names
in them and the other users of the system may see a slight decrease in the
effectiveness of the filters for drug related spam.
An approach that tries to improve on existing Bayesian filtering is
looking at word group and the number of times that words repeat. This is
probably the future of spam filtering as spam marketers become more adept at
circumventing the existing single word Bayesian statistical spam filters. This
approach has many of the same advantages and disadvantages inherent in naïve
Bayesian filtering. The hope is that as the techniques are improved multiple
word filtering will improve even further on accuracy. A disadvantage of this
form of filtering is that it does take much more storage space to store all of
the seen two word combinations and probabilities (Burton, 2004).
Markovian Discrimination
The major problem with Naïve Bayesian filtering is that it
is by design naïve to the fact that word groupings are significant. Naïve
Bayesian filtering looks at each work independently without regard for words
around it. This can lead to successful attacks on the system such as adding
random words in an attempt to include clean words thus reducing the spam score.
This is difficult for the spam mailer since each person will have different spam
and ham words but it is a popular method used by many spam mailing companies.
Markovian discrimination looks at groups of words found in spam mail. Depending
on how closely a group of words can be modeled to known spam word groups the
higher the score given to the group. This group method has shown promise in
testing. The overall improvements are small but this is primarily due to the
fact that existing Naïve Bayesian filtering is already hovering around a 99.9%
success rate so improvements while significant may look minimal (Yerazunis,
2004). In existing implementations false positive rates seem to be similar to naïve
Bayesian filtering.
Challenge Response
Challenge response works by sending a challenge message to
the sender every time a message is received from a new e-mail address. The
challenge can be anywhere from simply replying to the message to as difficult as
following a link and entering a code along with personal information.
Challenge response was practically 100% effective against
spam when it was first implemented. Some spam senders are now using fake
addresses to send from that can include other known good addresses in a
corporate domain or sending from the address of the person the e-mail is sent
to. These types of attacks can bypass challenge response systems if the address
sent from has previously sent e-mail to the recipient. Some spammers may have an
auto reply system set up that will allow them to become part of the white list
in challenge response systems that simply require a reply from the sender.
The biggest problem with challenge response is that it has
a high rate of false positives and exerts a burden on legitimate senders of
e-mail. False positives are caused for several reasons. People forget to white
list mailing lists that they have signed up for and then wonder why they quite
receiving e-mail. Some people refuse to reply to these systems out of principal
stating that they feel that challenge response systems are akin to the recipient
saying “My time is more important than your. Fix my spam problem for me.â€
Finally if two people who have not sent e-mail to each other are both using a
challenge response system they will never see each other’s challenges since
their systems will continue to challenge each other. If challenge response
becomes a sufficient annoyance to spam operators they will eventually find ways
to automate replying to the challenges.
Gray Listing
Gray listing uses the fact that some spam operators use
e-mail servers that blindly send e-mail but do not retry failed connections.
Gray listing has a side effect of also stopping mass mailer worms that have
their own mailing engine since these mail engines typically blindly send e-mail.
Even spam operators that run full featured e-mail servers may turn of mail
retries in an effort to save resources. Gray listing sends out a temporary error
message to the sending server. Legitimate mail servers receive these and
interpret them as a need to wait a period of time and then try again. All
legitimate e-mail should get through unless the sending server is massively
misconfigured. The major downside is that during the gray listing period there
is a communication delay that some end users may find unacceptable. Gray lists
typically cache servers that have successfully resent e-mail. This means that
until the cache expires there is only a delay the first time that someone sends
from an unknown server. Gray listing also has an unintended consequence of
producing some extra work for legitimate mail servers since they have to connect
multiple times to send a message. Gray listing will not stop all spam because
some spam operations will send retries. If gray listing becomes popular it is
likely that the effectiveness of gray listing will be reduced significantly as
spam mailing companies start to retry e-mail addresses when they get a temporary
failure. Gray listing will have a lower success rate than most other spam
filters but it can be effective when used in combination with other spam
filtering techniques.
Problems with gray listing generally come down to systems not properly
handling rejected messages. Improperly handled messages can include mailing
lists removing a subscriber after a single bounced message. This is a fairly
aggressive stance for the mailing list operator since a temporary failure is
bound to happen from time to time with even the best maintained mail servers.
More worryingly some versions of Lotus Notes have been reported to not handle
messages that have a temporary failure condition. It is possible that other mail
transport agents also have similar problems. Once again this is a problem that
should be fixed by the sending mail admin but this must be considered when
evaluating gray listing. One final annoyance is that some sending servers may
have extremely long retry time. This can lead to e-mail delivery being delayed
for several hours.
Combined Filters
System administrators have a final option when deciding to
implement a spam filter. This option is using a system that mixes the best of
the different types of systems to create an overall solution for that
organization’s spam problem. Most commercial packages and many of the open
source solutions make a mixed approach an option. The only approach that I have
seen commonly implemented without the use of any other methods as backup is
statistical filtering at the individual user’s mailbox level. By mixing
different approaches the administrator has the option to weight different
filtering techniques with an appropriate level of trust.
Data Collection
The data collection method can have a significant impact on the
reliability of the results returned by the filter. As such, this is an important
consideration. Different methods are appropriate for different techniques.
Static blacklist filters have three major ways that they collect data.
They either scan for servers, use decoy addresses or use a nomination system.
Scanning for servers works well for open relay and open proxy blacklists.
Since these are both conditions created when a system administrator has
incorrectly configured the system in question it is easy for the blacklisting
service to scan for these servers. Actually, what the blacklisting services that
use server scanning do is fairly similar to what spammers looking for servers to
exploit do. Both groups will scan large portions of IP space for any servers
that are configured as open relays or open proxies. Only their motives differ.
Scanning servers does have the down side of, regardless of the intent, looking
like an attack to monitoring systems on the scanned network. There have also
been issues with systems scanning servers actually causing them to crash due to
bugs inherent in the server software. (Wagner, 2002, March 20).
Decoy addresses are addressed that are specifically set up to receive
spam. A real user does not ever use these addresses so there should not be any
legitimate e-mail going to the address. Typically these addresses will be
included as hidden text inside of a web page or in other public forums where
addresses are harvested for spam. This allows automated programs that look for
e-mail addresses to find them without regular users getting snared in the spam
trap. Anyone who sends to one of these addresses is assumed to be a spammer and
added to the blacklist.
Nominations can be a blessing or a curse. They give real users who
receive spam an outlet to report the spam to someone who can hopefully do
something about it. Unfortunately there are issues with people sometimes
forgetting about subscribing to a mailing list and then later reporting it as
spam. I manage a small 30,000 user double opt-in list. I have subscribed to a
service through AOL so I see any messages sent to AOL users that create a spam
complain. In the last few weeks I have had at least one person every week send a
spam complain about the mailing list confirmation message. These are at least in
theory people who a mater of a few minutes earlier had put their e-mail address
into a web form asking to receive e-mail from the list they are complaining
about. There are also always a few complaints every time we send a message. I
believe some of this may be because people read the subject and mistake it for
spam but I also feel that a great deal of these incorrect classifications of
spam come from people forgetting that they signed up for the list. As this
anecdotal evidence implies individual end users may not always be the best way
choosing what is spam in a distributed system where many people may be affected
if they misclassify mailing list messages as spam.
The last method of data collection is statistical analysis. At the
simplest level these consist of a file that contains the probability of every
word that had previously been seen in an e-mail message. Based on these
probabilities new messages are assigned a probability of being spam.
More complex systems may use groupings of two or more
words. This should help to improve accuracy by looking at the writing style of
spam and legitimate messages. Multiple word statistical approaches will require
a much larger corpus of training messages to give the filter the ability to see
as many different combinations of word groups as possible.
How Spam Filtering Slows Spam
These approaches to spam filtering have two ways that they
help fight the spam problem. One, by blocking spam end users do not have as many
garbage messages to go through. Most people are not concerned with having to
delete a few garbage messages but the amount of spam has reached a point where
individuals manually deleting spam have to either take a productivity reduction
in carefully scanning through their e-mail or they will themselves start
creating false positives by accidentally deleting legitimate messages as spam.
Most estimates put human spam filtering at a lower success rate for both false
positives and false negatives than highly trained statistical filters. Secondly
beyond removing the annoyance factor for most users as filters become more
effective more ISPs will be able to filter e-mail without having to worry about
stopping their client’s legitimate e-mail. As messages are tagged or deleted
before they reach the end user it will be more difficult for spam senders to get
their message through to the very small minority that actually buy their
products. This will lead to higher costs of operation and lower profits for spam
mailing companies. If the filters are successful enough they may even remove the
profit motive completely.
Spam has made e-mail less useful and more expensive and the problem is
only getting worse. In late 2004 stand at a 65% spam rate and the amount of spam
doubles every 12-18 months. Now is the time for e-mail filtering to become
prevalent. There are numerous different methods of filtering and the goals of
the person doing the filtering will to a great extent determine which method
they chose to employ. The question is no longer whether to filter spam but what
method to use in filtering e-mail.
References
Bray, H. (2004, June 9). Home PCs big source of spam.
Retrieved November 16, 2004, from http://www.boston.com/business/technology/articles/2004/06/09/home_pcs_big_source_of_spam/
Burton, B. (2004). SpamProbe - Bayesian spam filtering
tweaks. Retrieved October 17, 2004, from http://spamprobe.sourceforge.net/paper.html.
Cox, J. & Dyrness C. (2003, May 28).
Spam prevention may lead to filtering of legitimate messages [Electronic
Version]. Knight Ridder Tribune Business News, 1.
Detailed end user information for MAPS NML listings.
(n.d.). Retrieved November 16, 2004, from http://www.mail-abuse.com/support/enduserinfo_nml.html.
Farmer, J. (2003, December 27). An FAQ for
news.admin.net-abuse.email part 3: understanding NANAE. Retrieved October 3,
2004, from http://www.spamfaq.net/terminology.shtml.
Filtering technologies in Symantec Brightmail AntiSpam
6.0. (n.d.). Retrieved November 15,
2004, from https://enterprisesecurity.symantec.com/content/displaypdf.cfm?SSL=YES&PDFID=1025.
Gaspar, S. & Gaudin, S. (2001, September 10). Spam
police. Network World, 18(37), 58-62.
Graham, P. (2003, January). Better Bayesian filtering.
Retrieved October 17, 2004, from http://www.paulgraham.com/better.html.
Leyden, J. (2004, May 14). Spam fighters infiltrate
spam clubs. Retrieved November 16, 2004, from http://www.theregister.co.uk/2004/05/14/spam_club/.
Mertz, D. (2002, August). Spam filtering techniques:
Comparing a half-dozen approaches to eliminating unwanted email. Retrieved
November 16, 2004, from http://gnosis.cx/publish/programming/filtering-spam.html.
Moody, G. (2004). Spam's tenth birthday today.
Retrieved November 16, 2004, from http://news.netcraft.com/archives/2004/03/05/spams_tenth_birthday_today.html
Open relay database FAQ. (n.d.). Retrieved November 16, 2004, from
http://www.ordb.org/faq/.
Schwartz, E. (2003, July/August). Spam
wars. Technology
Review, 106(6), 32-39.
Spews.org FAQ. (n.d.). Retrieved November 13, 2004,
from http://spews.org/faq.html
The state of spam Impact & solutions. (2003,
January). Retrieved November 13, 2004, from
http://web.archive.org/web/20030621231814/www.brightmail.com/press/state_of_spam.pdf.
Vaughan-Nichols, S (2003). Saving private e-mail. IEEE Spectrum, 40(8), 40-44.
Wagner, J. (2002, March 20). Facing Legal Challenge,
Blackhole List Closes. Retrieved January 8, 2005, from http://www.internetnews.com/dev-news/article.php/10_995251.
Wagner, J. (2002, May 23). When spam policing gets out
of control. Retrieved October 17, 2004, from http://www.internetnews.com/xSP/article.php/8_1143551.
Yerazunis, W. S. (2004). The Spam filtering Accuracy
Plateau at 99.9% Accuracy and How to Get Past It. 2004 MIT Spam Conference,
January 18, 2004. Retrieved
December 22, 2004 from http://www.merl.com/publications/TR2004-091/.
Note for
HTML version. If I refer to a print article that is what I used in my research.
I have attempted to find web versions for print articles that I used. There may
be some differences between print and online versions of articles.
Note on links in general. Unlike
print publications web sites can change their articles so there may be some changes
between when I originally looked at a web page and now. Archive.org
is a good way to look at web sites as they were at a previous date.