An Investigation Into World Wide Web Search Engine Use from within the UK: Preliminary Findings
The use of the World Wide Web (WWW) is increasing and is likely to continue to do so for the considerable future. The UK is served with a very good quality internal Internet backbone providing JANET (Joint Academic NETwork) users with a high level of bandwidth, resulting fast communications. Unfortunately, international connections from the UK are poor and it is clear that ways of addressing the limited bandwidth the UK has at its disposal have to be found.
In this paper we describe the results of a project undertaken for UKERNA to investigate the use of overseas searching engines by the UK academic community. These searching engines are primarily based within the USA. With the growth of information available through the WWW congestion on the transatlantic link has risen as more users access searching engines to identify the location of valuable information.
To address this problem we describe two key areas of research we have undertaken to identify current search engine use trends. The overall goal was to identify whether a UK based search engine would help alleviate international bandwidth congestion.
The Project Overview
Currently, UK based cache servers are servicing in the order of 900,000 requests per day and this set to rise. Because of the limited bandwidth at our disposal strategies are therefore required to improve the service to users by providing UK based facilities, thus reducing the transatlantic traffic and thereby improve its overall performance. The siting of a new or "mirroring" an existing search engine within the UK has been proposed as a means of reducing the amount of international traffic.
We decided from the beginning of the project that our investigation should provide a means for the whole of the UK Internet community, in particular the academic community to contribute their opinions to the project. Therefore we decided to proceed with our investigation in two main research directions:
- An examination of UK based server logs, contributed from various academic organisations. Two different data sets were compiled and the actual search engine accesses were analysed. This it was felt would provide statistics on the number of search engine access and which search engines were accessed the most.
- An electronic survey (based on a WWW questionnaire). This involved the design and construction of an electronic questionnaire to establish academic user requirements, and to gather information about user behaviour, preference, and satisfaction or dissatisfaction with the current provisions of search engines.
An Investigation of Server Logs
UKERNA kindly provided us with our Web log data to analyse. This was provided anonymously from various sites within the UK. The names of the organisations which supplied the data has to remain confidential, but they do represent a good sample of about 10% of the sites on the JANET network from the very large to the very small.
Two separate logs were compiled:
- The first containing 1,000,542 URLs. This was taken from the cache of a single site covering the dates Oct19th to the 25th 1996 for one week.
- The second much larger sample contained 16,762,872 URLs. This data was collected from 23 JANET sites and covers a period from June 1995 at from a few of the sites up to late Oct 1996.
In the initial stages of analysis the logs were examined to provide the total number of URLs which contained a match to 50+ search engines which we had previously identified. While, this provided results it did not prove a very satisfactory way of identify which search engines were accessed more than others because:
- All graphics on the search engine pages were counted as separate hits, thus search engines with more graphics on their pages faired better
- The results included any visits to search engines whether searches were performed or not.
To address the first problem we decided to narrow the search patterns down to remove all GIF and JPEG references from the search. To address the second problem we decided to search only for URLs of web pages which contained results of searches generated by search engines.
Analysis of the 1,000,000 URL Sample
Figure 1 illustrates the results of the smaller data sample. This contained a total of 1,000,542 URLs. Our analysis returned a total of 18,015 search engine result page URLs, which are presented as a percentage distribution.
Figure 1 - Search Engine distribution from the 1,000,000 URL Sample.
The most commonly identified search engine result pages were Yahoo 31%, Excite 25%, Magellan 15% and AltaVista and Lycos with 8% each.
Analysis of the 16,000,000 URL Sample
Figure 2 illustrates the results of the larger data sample. This contained a total of 16,762,872 URLs. Our analysis returned a total of 230,76 search engine result page URLs, which are presented as a percentage distribution.
Figure 2 - Search Engine distribution from the 16,000,000 URL Sample.
The most commonly identified search engines were Lycos 24%, Yahoo 23%, Magellan 18%, AltaVista 12% and Excite 10%.
World Wide Web Search Engine User Survey
Our survey was carried out via a WWW Questionnaire between July and September 1996. Respondents were asked to complete a form consisting of 14 questions which were submitted electronically and processed automatically by a Common Gateway Interface (CGI) application. Obtaining results in this way enabled us to rapidly analyse the data and produce statistics.
We present below some selected results from our survey, but those interested in obtaining full results can do so at: http://osiris.sunderland.ac.uk/sst/se/
The survey returned 402 questionnaires which we used in the production of our results. Of these 353 (88%) respondents were from domains with the extension .ac.uk. A maximum of 7% of respondents were from a single UK domain location, which was the University of Sunderland.
Which Search Engines Do You Use?
Respondents were asked to identify the search engines that they used. Figure 3 illustrates the search engines which respondents identified that they used most commonly. Respondents were allowed to select and/or name more than one search engine. The most frequently identified were AltaVista with 312 responses, Yahoo with 280 responses and Lycos with 225 responses.
Figure 3 - Most Commonly Used Search Engines
Which Search Engine Would You Use First?
In an attempt to gauge which search engines users preferred respondents were asked which search engine they normally used first. The results of which are illustrated in Figure 4. AltaVista scored most highly here with 194 responses. This equated to 48% of responses. Next popular were Yahoo with 59 responses (15%) and Lycos with 46 responses (12%).
Figure 4 - Which Search Engine do you use first?
Why Do you Choose a Search Engine?
Respondents were asked why they chose to use a particular search engine. Figure 5 illustrates that there was a wide range of responses to this question. Most popular were Speed of Access to Information (186 responses) and the amount of information users perceived was stored in the search engine repository (146 responses).
Figure 5 - Reasons for using favourite search Engine
What Problems Do You Encounter With Existing Search Engines?
Users Identified three clear problems that they encountered when using existing search engines, these are illustrated in Figure 6. The primarily problems reported are:-
- The time to takes for the search engine to return its information is too long.
- Too much information is normally returned.
- Information returned is out of date or no longer in existence.
Do You Use Any UK Based Search Engines?
We asked respondents whether they used any UK based search engines. Figure 7 illustrates that 167 respondents (42%) claimed that they did not use a UK based search engine. However 139 respondents (35%) were unsure whether they did or did not.
Figure 6 - Problems with existing search engines
Figure 7 - Respondents who use UK based search engines
Why Do you Use Non-UK Based Search Engines in Preference to UK based Ones?
Figure 8 illustrates that when asked why non-UK search engines where used in preference to UK based ones 54% of respondents claimed that "lack of information" was the primary reason.
Figure 8 - Reasons for not using UK based search engines.
If the UK was to Mirror an Overseas Search Engine what would be Your First Choice?
Figure 9 illustrates that 212 respondents (53%) would as their first choice choose to mirror the AltaVista search engine.
Figure 9 - First choice of search engine to mirror in the UK
Conclusions
We have not currently drawn any conclusions from our research. Results of the project are to be presented to UKERNA and other interested parties at the University of Sunderland on 20th November. During this meeting final conclusions are to be drawn, giving everyone who attends an opportunity to contribute to the conclusions and provide an opinion on our findings.
Acknowledgements
We would especially like to thank Mr Stephen Bonner of UKERNA for his support and co-operation during this project.