Web Magazine for Information Professionals

Web Watch: An Update On Search Engines Used In UK Universities

Brian Kelly with an Update On Search Engines Used In UK Universities.

In September 1999 we published our first report [1] on the search engines which were used to provide search facilities on UK University Web sites.

Since then we have updated our survey at roughly six monthly intervals.

Following a recent update, we will now discuss the findings and comment on the trends we have observed.

Latest Findings and Trends

The latest findings have now been published [2] which contain details of the search facilities on UK University Web sites. A summary of the findings of the first and most recent survey is given below.

Histogram of findings

Table 1: Summary Of Surveys in Sept 1997 and Dec 2001
Search EngineSept 1997Dec 2001
ht://Dig2549
eXcite195
Microsoft1218
Harvest81
Ultraseek / Inktomi712
Google013
Other2931
None6030
Total160159
 
Note
1 - Figures for Inktomi also include Ultraseek.
2 - Three instituions are included twice, as they provide both ht://Dig and Google.
3 - A number of institutions no longer exist, due to mergers.

From this summary we can see that ht://Dig has consolidated its position as the most widely used search engine. It is now used by about 30% of institutions - a growth of almost 50% over the past two years. The second most popular search engine is a Microsoft solution (which can be one of several, such as Microsoft's Index Server, the SiteServer Catalogue Server or the FrontPage indexer). In third place is Google, which was not available two years ago, but is now used by thirteeen institutions (which includes three institutions which also use ht://Dig). Closely following is Inktomi/Ultraseek (Inkotomi took over Ultraseek last year, so the figures are combined).

There are no other search engines which are in widespread use (the next most popular is eXcite, which is no longer being developed, followed by search engines which are used by only 1-3 organisations.

The number of institutions which do not have (or appear to have) a search facility has dropped by 50% to 30.

Discussion

The Need To Update Facilities

In the mid to late 1990s a wide range of search engines were used by UK Universities. These included many which had been developed by the research community, such as Harvest, wwwwais, Isearch and SWISH.

However many of these tools are no longer adequate: the end user interface and functionality may be inadequate, limited functionality may be provided for the administrator or there may be security concerns.

Two examples of the interface provided by dated search engines are shown in Figure 1.

Figure 1: Output From the eXcite Search Engine Figure 1: Output From the WAIS Search Engine
Figure 1: Output From the eXcite and WAIS Search Engines

It can be seen that the summary details provided by eXcite can contain JavaScript code, rather than the textual information end users will expect. It should also be noted that a security bug in eXcite was reported in January 1998 [3] and the software is not believed to have been updated since then.

In contrast we can see the interface provided by more up-to-date search facilities such as ht://Dig and Google, which are illustrated below.

Figure 2: ht://Dig Search Facility Figure 2: Google Search Facility
Figure 2: ht://Dig and Google Search Facilities

It can be seen that the current generation of search engines display the context of the search term and provide extra functionality, such as searching for similar pages.

Type of Search Facility

We have now reached a period of consolidation, with use of search engines such as Isearch, Harvest, etc. limited to one or two institutions. We can now see that most institutions are now taking one of the following three strategies:

A well-used open source package
ht://Dig is the only package which falls into this category.
A licensed package
This includes one of the Microsoft packages and Inktomi/Ultraseek.
An externally-hosted search facility
Google is the most popular example of a search facility which is hosted by a third party organisation.

ht://Dig is a safe solution, widely used within the community, and easy to set up, configure and to provide a local look-and-feel. ht://Dig is an open source solution, which means that the software is free to use. Being open source also means that the source code of the software is freely available, and can be modified and developed. However there is little indication that developments to ht://Dig are being made.

A significant number of institutions have paid for a licensed solution, the most popular being Inktomi/Ultraseek or a Microsoft product(although it should be mentioned that the Microsoft search facility is probably bundled with the Microsoft server software and is not purchased separately.

The third most widely used approach is to make use of an externally-hosted search facility. The most popular is Google, although this category also includes Atmoz and Picosearch.

It should also be noted that a small number of institutions are beginning to provide access to more than one search facility: for example ht://Dig and Google (e.g. The University of Reading, as shown below).

Figure 3: University Of Reading Offer Two Search Engines
Figure 3: University Of Reading Offer Two Search Engines

No Search Facility Available

As has been noted, the number of institutional Web sites which provide no search facility (or the search facility is not easily found) has decreased by over 50% in the past two years. Although this trend is pleasing there are still 27 institutions whose users would benefit from such a service. Externally-hosted search facilities such as Google can provide a search facility with very little effort or technical expertise required. Contact will be made with Web managers at these institutions to inform them of this.

It has also been noted that a small number of institutions which previously provided a search facility, now no longer do so. Typically such institutions provide a page containing a message along the lines of "The search facilities are currently being redeveloped. A new, improved service will be available shortly." Unfortunately 'shortly' can take a long time to arrive! If search facilities are withdrawn (if, say, a search engine is found to have a security hole) it may be worth providing an externally-hosted search facility as an interim measure.

Changing The Search Facility

Over the past two years many institutions have changed their search facility. It has been noticed that when a new search facility is introduced many organisations will change the URL of the search page. This is normally due to the name of the search facility forming part of the URL (e.g. <www.foo.ac.uk/htdig/>).

It would not be surprising if we see more further changes: for example institutions moving to ht://Dig, providing multiple search options or moving from ht://Dig to search facilities which provide greater functionality (note, for example, that ht://Dig's support for metadata is very limited).

In order to minimise disruption when changes are made institutions are advised to ensure that the URL of their search facility is general and is not tied to a particular product.

The Future

What developments can we expect to see in the future? An important one is likely to be the use of hybrid search facilities which enable users to find resources which are located not only on an institution's Web site(s) but in other institutional systems, such as the University library catalogue. This approach can be extended to address the end user's need to find a quality resource, wherever it may be located.

This is the approach which was taken by Hybrid Library projects funded by the eLib Programme (for example the BUILDER project [4] which developed a search facility which could be used to search across a number of distributed services.)

This approach of providing users with a search interface to a wide range of distributed quality resources is being addressed by JISC's DNER (Distributed National Electronic Resource) work [5].

Another development we may see is integration of the search interface with the browser. This can already be done, using the Google toolbar [6]. As illustrated below this interface can be used to search not only the Web but also the Web site of the page currently being viewed.

Figure 4: Google Search Facility Can be Integrated With A Browser Figure 4: Google Search Facility Can be Integrated With A Browser
Figure 4: Google Search Facility Can be Integrated With A Browser

It should be possible to integrate the local search engine into the browser's toolbar, perhaps using the approach document by the Bookmarklets Web site [7].

Evaluating Search Engines

In order to help institutions evaluate search engine software UKOLN has set up a page [8] which allow its Exploit Interactive Web magazine to be searched used a number of search engines, as shown below.

Figure 5: Search Engines To Search Exploit Interactive
Figure 5: Search Engines To Search Exploit Interactive

We would like to increase the range of search engines covered, to include not only search engines which are used within the community, but also search engines which may be of interest to the community. We have begun work on this by providing access to the ISYS search facility [9].

Anyone who would be willing to index the Exploit Interactive Web site and allow us to host the search interface should contact Brian Kelly (email B.Kelly@ukoln.ac.uk). Please note that in the case of licensed software, you will need to ensure that the licensing conditions allow you to index remote resources.

This page is complemenetd by a "Web tour" of the search engine interface provided on UK University Web sites [10].

Conclusions

For an institutional Web service ht://Dig appears to provide a safe and popular choice. It would appear that ht://Dig is popular because it is available free of charge rather than the source code is available.

Google is now an option not just for smaller institutions with limited technical resources but also for providing a search facility for large institutions.

We are also likely to see more institutions providing access to more than one search facility, probably a locally-installed product and an externally-hosted one. This approach will provide the benefits of a search engine such as Google, while also providing a backup option if the external service is unavailable, as well as providing access to resources which an external search engine cannot access (e.g. resources on the Intranet) or which will not be indexed in a timely fashion (e.g. new resources which may not be indexed for a several weeks).

However even though well-used, mature search facilities are available we can still expect to see developments in this area. Web managers would be advised to ensure that their search facilities can be migrated to new systems with the minimum of disruption.

References

  1. WebWatch: UK University Search Engines, Ariadne 21, Sept 1999
    http://www.ariadne.ac.uk/issue21/webwatch/
  2. Survey Of UK HE Institutional Search Engines - Dec 2001, UKOLN
    http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/2002-01/
  3. EWS Download, eXcite
    http://www.excite.com/navigate/download.html
  4. CLUMPS Projects Search, BUILDER, University of Birmingham
    http://www.builder.bham.ac.uk/cps/
  5. Welcome To the DNER, JISC
    http://www.jisc.ac.uk/dner/
  6. Google Toolbar, Google
    http://toolbar.google.com/
  7. Bookmarklets - free tools for power surfing, Bookmarklets
    http://www.bookmarklets.com/
  8. Search Exploit Interactive Using a Variety of Search Engines, Exploit Interactive
    http://www.exploit-lib.org/search/multiple.asp
  9. Search Exploit Interactive Using a Variety of Search Engines, Exploit Interactive
    http://www.exploit-lib.org/search/multiple.asp#isys
  10. Web Tour of UK HE Search Engines, UKOLN
    http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/web-tour/

Author Details

Picture of Brian Kelly Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
BA2 7AY

Email: b.kelly@ukoln.ac.uk

Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath