Web Magazine for Information Professionals

Search Engines Corner: Keyword Spamming - Cheat Your Way to the Top

Tracey Stanley shows how metadata can be abused to enhance the search engine ranking of Web pages.

In the increasingly market-driven environment that is the web, it is becoming ever more important to ensure that your site gets seen by as many people as possible. In particular, corporate and organisations who are trying to attract advertising revenue to their web sites need to ensure that they can show potential advertisers that placing an advert on their pages is likely to achieve maximum exposure. Under such competitive circumstances, web authors are increasingly beginning to rely on underhand techniques to ensure their sites get seen. One emerging practice is that of keyword spamming - doctoring the content of a web site to ensure that it hits the top of a list of results retrieved from a web search engine.

When you do a search on a search engine such as Excite or Alta Vista you might type in a keyword such as London. What happens next is that the search engine will check for documents that include your requested keyword and the document will be ranked depending on how many times your keyword appears in the text, and where it appears. Thus, the results you get back will be ranked for their relevancy, and their placing in your list of results will depend on this relevancy ranking score. A search engine will typically display your results page by page, perhaps 10 hits at a time. Many users will tend perhaps to look at only the first page of results they get from a search engine, rather than scrolling through pages and pages of hits. Therefore if you want to get people to see your web site it’s all-important to get into that top 10.

Keyword spamming in action can be seen if you take a look at the HTML source of a document. Search engine spiders will index documents using a range of different strategies. Some will simply index the entire text of a document, making full-text searching possible, but making it a little difficult to achieve high relevancy in your searches. Others will rely on meta tags in documents - tags in the head of a document which can be used to provide information on ownership and content. Alta Vista will use meta tags for indexing if these are present, in preference to indexing the first hundred lines of the document. Meta tags, which have been discussed in detail by other Ariadne contributors, are used quite legitimately for indexing of web documents, and where used appropriately they can be an extremely useful indexing tool which can make it easier for searchers to find materials which are relevant to their needs. Devious web authors, however, can exploit and abuse the use of meta tags in order to push their sites up the relevancy ratings.

A classic example of keyword spamming could be seen at the Heaven’s Gate web site - the web site of the San Diego cult who committed mass suicide recently. This example is taken from an article in Macworld [1]. The Heaven’s Gate web site used keyword spamming in a number of ways. Keywords which were shown to have a high incidence of people running searches against them were embedded in meta tags. An example of this is shown below [2]:

meta name=“keywords” content=“Heaven’s Gate, Heaven’s Gate, Heaven’s Gate, Heaven’s Gate, Heaven’s Gate, Heaven’s Gate, ufo, ufo, ufo, ufo, ufo, ufo, space alien, space alien, space alien, space alien, space alien, space alien, extraterrestrial, extraterrestrial, extraterrestrial, extraterrestrial, extraterrestrial, extraterrestrial, millennium, millennium, millennium,millennium, millennium, millennium, millennium, misinformation, misinformation, misinformation, misinformation, misinformation, misinformation, freedom, freedom, freedom, freedom, freedom, freedom, second coming, second coming, second coming, second coming, second coming, second coming, angels, angels, angels, angels, angels, angels, end times, end times, end times, end times, end times, end times, Jesus, Jesus, Jesus, Jesus, Jesus, Jesus, God, God, God, God, God, God”

As can be seen, each keyword is repeated a number of times in order to force the site higher up a search engine’s list of hits. People searching on subjects such as millennium or extraterrestrial would have been likely to find this site near the top of their set of results.

Another method used was that of hidden text. Black text was used against a black background near the bottom of the web page. This didn’t show up on screen as it was effectively masked by the background. However, it was visible in the HTML source of the document, where it was picked up and indexed by search engines. A huge number of keywords were slipped into the web site by stealth using this method. An archived copy of the Heaven’s Gate site is available [2].

Another common technique is the use of words in the meta tags which don’t actually appear in the main body of a page. This effectively gives you two bites of the cherry - and a wider number of keywords under which your page might get picked up.

A slightly more worrying development is the tendency for web authors to include keywords in their documents which bear no relation to the subject of the document. One increasingly common trick is for web authors to fill their meta tags with keywords relating to sex and pornography, even if their site doesn’t actually contain this type of information. Thus, if someone runs a search against one of these keywords the site is included in the list of hits, even though it isn’t actually covering this topic, and the unsuspecting web-user who follows the link is mislead into thinking they have found relevant material. As searches against sex-related keywords are extremely common on the web this is actually quite an effective way of boosting the hit rate of a site.

Some search engines are beginning to take active measures against keyword spamming by ignoring keywords which appear more than six or seven times in a row. Lycos in particular is now beginning to give lower priority to pages with long strings of repeated words.i However, web authors can get around this by sorting their keyword so that instead of having a list of keywords such as London London London hotels hotels hotels they might instead have London hotels London hotels London hotels. This would fool the search engine into letting the page through.

Abuse of keywords is likely to become an increasing problem on the web as content continues to grow and web authors feel the need to adopt increasingly aggressive strategies in order to market their sites and get them seen by as wide an audience as possible.

References

[1] Getting to the Source: Is it Real or Spam, Ma’am, Liberatore, K., Macworld, 2 July 1997
http://www.macworld.com/features/pov.4.4.html

[2] Archive of the Heaven’s Gate Web Site,
http://www.sunspot.net/news/special/heavensgatesite/2index.shtml

[3] Cheaters never Win, K. Murphy, Webweek, 20 May 1996,
http://www.webweek.com/96May20/undercon/cheaters.html

Author Details

Tracey Stanley,
Networked Information Officer,
University of Leeds Library, UK
Email: T.S.Stanley@leeds.ac.uk
Personal Web Page: http://www.leeds.ac.uk/ucs/people/TSStanley/TSStanley.htm