Web Focus: Using the Web to Promote Your Web Site
Many readers of this article will be involved in setting up new web sites, possibly for European or nationally-funded projects, for internal, institutional projects or perhaps for community projects. As the size of the web grows there is an increasing awareness of the need to be pro-active in promoting web sites - we can no longer simply sit back and expect visitors to arrive at our new site. This article describes a variety of approaches which can be taken to the promotion of a web site. The article is based on a presentation on "Promoting Your Project Web Site" [1] given at the "Consolidating The European Library Space" conference [2].
Submission to Search Engines
Many visitors to a web site will find the web site through use of a search engine. Although search engines can find new web sites automatically as they become linked into the web from existing web sites the growth in the size in the web is making it increasingly difficult for indexing robots to keep up. It is probably desirable to be proactive and submit resources to search engines when a web site is launched.
Many of the main search engines provide an option to "Submit a Resource". Figure 1 illustrates the interface for submitting a resource to AltaVista.
Figure 1: Submitting a Resource to AltaVista
Since there are a number of popular search engines and the search engines may limit the number of URLs which can be submitted it may be desirable to make use of a submission application or web service.
A large number of submission programs are available including WebPosition [3], NetSubmitter [4], RegisterPro [5], Engenius [6] and the Exploit Submission Wizard [7].
In addition to the submission programs there are a number of web-based submission services including Broadcaster [8] and Submit-it [9].
An illustration of one of these products (Web Position) is shown in Figure 2 (click to view enlarged image).
The products for submitting resources to multiple search engines typically provide other functions as well, such as analysing your pages, reporting on your position in search engines, creating metadata, etc.
Web Directories
Web directories such as Yahoo! are an alternative to search engines. They also provide a popular location for searching for resources. Unlike search engines web directories are compiled manually. Web directories also provide an interface for submitting resources, as illustrated in Figure 3.
Figure 3: Submitting a Resource to Yahoo!
A number of the submission programs will automate the submission of resources to web directories as well as search engines.
Possible Problems
Can we solve the promotion of our web site by simply purchasing a submission program? Unfortunately not. Due to the sheer size of the web search engines and directory services do not attempt to index all resources they find.
- A sample of a web site may be indexed
- Although the coverage of commercial search engines, for commercial reasons, tends not to be fully documented, it is believed that a number of search engines will only index a small sample (say 500 pages) of a web site.
- A robot may only index to a limited depth
- Indexing robots may only index the "surface" of a web site and not follow resources which are located deep in the hierarchy.
- The user interface may present a barrier to the robot software
- A number of indexing robots cannot process framed web sites or web sites with "splash screens".
- A robot may not index certain URL strings
- URLs containing questions marks (e.g. http://www.foo.com/get.asp?record=1) may not be indexed.
- A directory service may only catalogue complete web sites and not individual projects
- It is believed, for example that sub-domains have difficulties in getting into Yahoo!
Possible Solutions
Some possible solutions to the challenges listed above follow.
Domain Name
If a project has its own domain name it is more likely to be catalogued by a directory service such as Yahoo! In addition it is more likely to be fully indexed by a search engine than if it was part of a large web site.
The Robot Exclusion Protocol
Since search engines are likely to index only a small part of a web site it may be desirable to control the areas of the web site which are indexed. For example you may wish to exclude personal information, draft resources or experimental work from being indexed.
The Robot Exclusion Protocol (REP) enables a web site administrator to specify areas of the web site which should not be indexed. The REP makes use of a robots.txt file located in the root of the web server. A typical robots.txt file is shown in Figure 4.
User-agent: * # Following apply to all robots Disallow: /cgi-bin/ # Don't index /cgi-bin directory Disallow: /tmp/ # Don't index /tmp directory Figure 4: A Typical robots.txt File
The robots.txt file has a simple format and can be managed by hand. However a number of tools are also available to help you manage this file, such as RoboGen [10].
Robot Exclusion in HTML
Although the Robot Exclusion Protocol is conceptually very simply, in practice it may be difficult to exploit since updating the robots.txt file is likely to be restricted to the web site administrator. Fortunately there is now a HTML feature which enables authors of HTML pages to control access to their pages. The following HTML element located in the HTML HEAD:
<meta name="robots" content="noindex, nofollow">
will prevent robots from indexing the resources and following links within the resource.
Further information on the Robot Exclusion Protocol and Robots META tag has been produced by Martijn Koster [11].
Web Site Design
Avoid use of frames and splash screens in your web site design. As well as enabling indexing robots to access resources on your web site this also has additional accessibility benefits (visitors with browsers which do not support frames will still be able to access your web site).
Improving Search Results
Once the key pages in your web site have been indexed by a search engine you might expect a sensible query to retrieve the resources. Unfortunately the resource may fail to be located near the top of the search results. How can you improve the ranking?
Metadata
Metadata may help to improve the ranking. Simple keywords and description metadata, as illustrated below is desirable since this metadata is used by a number of search engines, including AltaVista:
<meta name="keywords" content="exploit, web magazine, TAP, telematics"> <meta name="description" content="Exploit Interactive is a ..">
Dublin Core metadata provides a more comprehensive and standardised approach to metadata for resource discovery. Unfortunately it is not yet widely support by the major search engines. It is probably worth implementing Dublin Core metadata if you can make use of it to enhance local searching and you can address the maintenance of the metadata.
An example of an approach of the use of metadata to enhance local searching and the architecture to manage the metadata can be seen in the Exploit Interactive web magazine [12]. The search interface is illustrated in Figure 5.
Figure 5: The Exploit Interactive Search Interface
As illustrated in Figure 5 the search facility can be used to search the full text of articles, the author of an article (using the DC.Creator Dublin Core attribute) or the description (using the DC.Description Dublin Core attribute).
The metadata is stored in a neutral format (as variables in an "Active Server Page"). A server side include (SSI) is used to transform the metadata to the appropriate format. Currently the metadata is transformed into <meta name ="DC.Creator" ...> and <meta name ="DC.Description" ...>. However in order to provide the metadata in, say, RDF, it would simply require a single update to the SSI script.
The approach taken by Exploit Interactive provides enhanced searching for visitors to the web site, Dublin Core metadata which could be used by third party applications and an architecture which helps to minimise ongoing maintenance.
Citation
So far we have considered techniques which will ensure that a web site is indexed and ways of improving the ranking. We should also take into account the citation of web sites - for example URLs which are included in articles (both online and print), used in publicity materials or spoken (e.g. when giving talks or presentations or on the phone).
Domain Name
The domain name for the web site can affect promotion of a web site in a number of ways. For example short and memorable domain names:
- Are easy to remember.
- Can be easily used in promotional materials.
- Are more likely to be indexed by search engines and directory services (as described above).
UKOLN uses the name www.exploit-lib.org and www.ariadne.ac.uk for its Exploit Interactive [12] and Ariadne [13] web magazines. Both of these domain names are short and easy to remember.
Use of separate domain names or qualified domain names - sometimes used by departments (such as http://www.scs.leeds.ac.uk/) and sometimes for a particular function (such as Student Home Pages at Loughborough University - see http://www-student.lboro.ac.uk/) - appears to be on the increase. This is probably due to (a) the ease and low cost of obtaining domain names and (b) the increase in expertise and knowledge of running web servers.
URL Conventions
As well as having a short, memorable domain name it is also desirable to make use of short URLs. Before releasing your web site it is useful to develop guidelines for URL naming conventions. Some suggestions are given below:
- Scalable Naming Conventions
- You should ensure that your naming conventions are scalable so that a re-organisation of your directory structure is not needed in, say, two years time.
- Avoid Unusual File Extensions
- You should try to avoid use of unusual file name extensions. For example files ending in .asp, .cgi and files which contain question marks (e.g. get.asp?record=1) are difficult to cite and may fail to be indexed by indexing robots. It should be noted that this suggestion make conflict with information management requirements (e.g. it may be desirable to store information in a backend database). If resources are accessed using a CGI script or a similar method, it is advisable to try to ensure that URLs which appear to be static are provided. A number of techniques, such as Apache rewrites, can be used.
- Make Use of Directory Defaults
- Use of the default file names for directories can help in shortening the length of URLs and avoiding ambiguities in file extensions. For example the URL for an article could be referred to as http://www.exploit.org/issue1/pride/article.htm but this could easily be confused with http://www.exploit.org/issue1/pride/article.html. If the article has a file name which is the web server's default file name when a directory is requested (such as into.htm or into.asp) not only with this ambiguity be resolved, but the URL will be shorter.
- Avoid Citation of Binary Files
- When referring to, say an individual document or presentation it is advisable to cite a HTML resource. For example URLs of the form http://www.foo.org/presentations/talk-dec1999.ppt should be avoided as (a) not all potential readers will have access to a PowerPoint viewer; (b) it is not possible to provide links to alternative versions of the resource and (c) it will be difficult to provide additional information related to the presentation.
Jakob Neilson's AlertBox column provides some valuable comments on the "URL as UI" [14].
Giving Away Your Web Site
As well as the various suggestions on ways in which you can enhance the visibility of your web site you may also wish to consider giving the web site away! For example you could:
- Give parts of your web site away. We have seen how metadata provides structured information about your web site which is given away to robot software.
- Give parts of the user interface to your web site away.
- Give your entire web site away.
Figure 6 shows an interface for searching for medical information on the web which is available on the OMNI web site [15].
Figure 6: The Interface for Searching for Medical Information on the Web at OMNI
This type of interface is probably more likely to generate search requests than a page simply containing links to the remote search interface. There are dangers in encouraging remote web sites to install a search interface to you web site search engine, in particular change control if you decide to introduce a new or updated search engine. However this is an option you may wish to consider.
You may wish to give your entire web site away. A mirror of your web site may enhance its visibility. If this is an option for your web site you may need to structure your web site so that it can easily be mirrored. This will include using directories to delineate areas of your web site which are to be mirrored, appropriate use of relative URLs and, if possible, ensuring that, if you use server-side scripting for management purposes, you hide (or rewrite) unusual URLs. Although these days sophisticated mirroring and replication software is available it will probably make the mirroring task much easier if the site has been developed with mirroring in mind. It should also be noted that this may also help in the digital preservation for a web site.
Publications
This article has described submission engines to search engines and web directories and described web architectures which will help to make web sites more accessible to search engines. In should be noted that articles about your web site can help in its promotion. Articles in print and web publications should obviously raise the visibility. In addition web magazines may submit their pages to search engines and links in the pages may be harvested. Web magazine may also be made available on CD ROM, in free text systems, citation reports, etc. As an example a number of Ariadne articles have been cited in Current Cites [16] and Ariadne itself features in PubList's Internet Directory of Publications [17].
Evaluation
If you have followed the various suggestions given in this article how can you evaluate the effectiveness and assess the benefits against the resources used?
Monitoring Links to Your Web Site
One suggestion would be to monitor the number of links pointing to your web site. The LinkPopularity.com web site [18] enables the numbers of links, as recorded by a number of large search engines, to be measured as illustrated in Figure 7.
Figure 7: The LinkPopularity.com Web Site
Monitoring the number of links to your web site, and the growth of the number of links will be useful in evaluated the impact of your web site. It can also be of use if you wish to sell advertising space on your web site. As Roddy McLeod, manager of the EEVL gateway [19] mentioned in a posting to the lis-elib Mailbase list:
"I tried [LinkPopularity.com], pointing out to a potential advertiser that EEVL had, according to HotBot, 1099 sites linking to it, whilst there were only 18 sites linking to their site, and suggested that what they needed was more exposure. It seems to have worked, as they have agreed to buy an ad on the soon to be released new design EEVL site." [20].
Analyse Your Web Statistics
Analysis of your web statistics can help in measuring the effectiveness of your web promotion strategy. A more thorough report on web statistics will be published at a later date. In this article mention will be made of analysis of access to web sites by robot software. The BotWatch software [21] can produce reports on access to your web site by robot software, as illustrated in Figure 8.
Figure 8: BotWatch
Conclusions
Ideally you will think about the promotion of your web site before the web site has been launched. A number of technical decisions which can help with web site promotion should be made before the launch as changes to a running service will be difficult to implement. However even if your web site is well-established many of the suggestions in this article will still be relevant.
Many of the suggestions given in this article on web site promotion will have additional benefits in other areas. For example:
- Robots and people with disabilities (e.g.blind users) have similar characteristics e.g. can't follow images, may not be able to access framed sites, etc.
- Indexing programs may index alt attributes in img elements.
- Sensibly-structured web sites can be more easily archived and mirrored.
- Metadata for general resource discovery can be reused for other applications (e.g. current awareness services).
Further Information
Additional useful information on web site promotion is provided by Deadlock Design [22], SearchEngineWatch [23], VirtualPROMOTE [24], Pegasoweb [25], did-it [26] and Yahoo! [27].
Book reviews for "Poor Richard's Internet marketing and promotions: how to promote yourself, your business, your ideas online" [28] and "How to promote your Web site effectively" [29] have been published in the Internet Resources Newsletter.
Checklist
A checklist of the points mentioned in this article follow.
References
- Promoting Your Project Web Site, Brian Kelly
http://www.ukoln.ac.uk/web-focus/events/concertation/libraries-nov99/ - Consolidating The European Library Space Conference, DG Information Society Cultural Heritage Applications Unit
http://www2.echo.lu/libraries/events/FP4CE/agenda.html - WebPosition Gold,
http://www.webposition.com/ - Net Submitter Professional,
http://www.netsubmitter.com/ - Register Pro,
http://www.registerpro.com/ - Engenius,
http://www.pegasoweb.com/engenius/ - Exploit Submission Wizard,
http://www.exploit.com/wizard/ - Broadcaster Website Promotion, Broadcaster
http://www.broadcaster.co.uk/> - Submit it!: Web Site Promotion and Marketing, Submit it!
http://www.submit-it.com/ - Robogen,
http://www.rietta.com/robogen/ - Robots Exclusion,
http://info.webcrawler.com/mak/projects/robots/exclusion.html - Exploit Interactive,
http://www.exploit-lib.org/ - Ariadne,
http://www.ariadne.ac.uk/ - URL as UI,
http://www.useit.com/alertbox/990321.html - Searching for Medical Information on the Web, OMNI
http://www.omni.ac.uk/other-search/ - CurrentCites Bibliography on Demand (search for Ariadne), CurrentCites
http://sunsite.berkeley.edu/CurrentCites/bibondemand.cgi?query=ariadne - Ariadne Main Information Page, PubList
http://www.publist.com/cgi-bin/show?PLID=4931361 - LinkPopularity.com,
http://www.linkpopularity.com/ - EEVL,
http://www.eevl.ac.uk/ - lis-elib archive, Mailbase
http://www.mailbase.ac.uk/lists/lis-elib/1999-11/0015.html - BotWatch,
http://www.tardis.ed.ac.uk/~sxw/robots/botwatch.html - Art of Business Web Site Promotion, Deadlock Design
http://www.deadlock.com/promote/ - Search Engine Submission Tips, SearchEngineWatch
http://www.searchenginewatch.com/webmasters/ - Web Site Promotion, VirtualPROMOTE
http://www.virtualpromote.com/promotea.html - Web Site Promotion, PegasoWeb
http://www.pegasoweb.com/ - did-it,
http://www.did-it.com/ - Yahoo!,
http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Information_and_Documentation/Site_Announcement_and_Promotion/ - Recent Internet Books in Heriot-Watt University Library, Internet Resources Newsletter, Issue 58, July 1999
http://www.hw.ac.uk/libWWW/irn/irn58/irn58d.html#recent - Recent Internet Books in Heriot-Watt University Library, Internet Resources Newsletter, Issue 59, August 1999
http://www.hw.ac.uk/libWWW/irn/irn59/irn59d.html#recent
Author Details
Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
BA2 7AY
Email: b.kelly@ukoln.ac.uk
Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath