Web Watch: Carrying Out Your Own Web Watch Survey
In 1997 UKOLN received funding for a project known as WebWatch [1] . The aim of the WebWatch project was to develop and use automated robot software to analyse Web sites across a number of public sector communities. After the project funding finished UKOLN continued to provide WebWatch surveys across communities such as UK Higher Education Web sites. However once the initial WebWatch software developer left it was decided to adopt a slightly different approach - rather than continuing to develop our own WebWatch robot software, we chose to make use of freely-available Web-based services.
The feedback we have received on our WebWatch surveys has been positive. A number of people have expressed interest in running their own WebWatch surveys. This article describes how you can apply the same methodology yourself across your own community, such as UK HE academic libraries, a particular academic discipline, projects funded by a particular programme, etc.
Why Carry Out Your Own WebWatch Survey?
Why would you wish to carry out your own WebWatch survey? There are several reasons: you may be a national centre and wish to observe approaches taken to the development of Web sites within you community; you may be a funding body and wish to ensure that Web sites you have funded comply with service level agreements; you may have an academic or research interest in developments; etc.
Whatever your reason WebWatch surveys will help you to:
- Make comparisons across a community
- Spot trends
- Measure the effects of other developments (e.g. has a publicity campaign results in more links to Web sites)
- Measure the uptake of architectures and technologies
Current WebWatch Methodology
Current WebWatch surveys are mainly based on the use of Web-based services. Although in some cases it would be possible to make use of desktop applications, the Web-based solution has been adopted since the methodology is transparent and this approach allows end users to have live access to the survey services for themselves, enabling them to reproduce the findings. Wherever possible the surveys make use of freely available Web services which will allow others to repeat the surveys without having to incur any charges for using licensed services. The methodology also allows them to compare arbitrary Web sites with published findings.
There are a wide variety of Web-based services which can be used in the way described. A number of them are listed in the following table.
Service | Function |
NetMechanic | General testing, including page analysis of HTML, links and file size. |
Bobby | Accessibility testing and page file size analysis. |
LinkPopularity | Numbers of links to a Web site, based on AltaVista, Google and HotBot. |
In order to check if a Web service which provides reports and an analysis of a Web site can be used in a WebWatch survey, go to the Web service and use it. In the page containing the results examine the URL window: if it contains the URL of the Web site you have analysed, you should be able to use the service to survey your community.
Figure 1: Use of the Bobby Accessibility Checking Service
Figure 1 illustrates this. The URL of the entry point of the UKOLN Web site is supplied. The Bobby accessibility checking service then analyses the UKOLN Web site. The URL of the page of results contains the URL of the UKOLN entry point:
http://bobby.cast.org/bobby?URL=http%3A%2F%2Fwww.ukoln.ac.uk%2F
(It should be noted that in this case certain characters (/ and :) are encoded.) Further information on this technique is given on the Bobby Web site [2].
We can then simply include the URL given above as a hypertext link in a Web page. Repeat this for all the entry points in our community and we can provide a WebWatch survey of the accessibility of the entry points within our community. If we so desire, we can use other Web services in a similar way to provide a more comprehensive survey.
It should be noted that if the Web service does note include the URL of the Web site being analysed, that the technique described in this article cannot be used in the manner described.
Carrying Out Your Own WebWatch Survey
We have described the technique used for carrying out WebWatch surveys. So how should we proceed with a survey?
Collecting The Web Site Entry Points
The first thing to do is to find an authoritative source for the entry points for your Web site. An organisation such as HESA may provide a definitive list of entry points to UK Higher Education Institutions [3]. Other useful starting points include NISS [4] and University of Wolverhampton UK Sensitive Maps [5]. Organisations outside the HE community who provide directories include iBerry's list of Higher Education Links [6] and Tagish's directory of UK Universities [7]. You will sometimes find lists provided by volunteers such as Ian Tilsted's list of UK Higher Education & Research Libraries [8] and Tom Wilson's World List of Departments and Schools of Information Studies, Information Management, Information Systems, etc. [9]. With all of these services you should bear in mind that the quality of the data may be variable - for example Dr Dave's links to UK HEIs gives a link to the Computer Science Department at the University of Hull and not the University of Hull entry point [10].
You should bear in mind that although some organisations and individuals may be willing for their data to be reused in this way, others may not. You should seek permission before reusing data from other people's Web sites - you may find that the Web site owner is happy for the data to be reused.
Using Your Data
Once you have obtained your list of URLs of entry points, you will have to use these as input data for the Web sites you will be using. It would be a simple, but , on a large scale, time consuming task to simply copy and paste the URLs into a HTML template for each entry point (having remembered to convert and special characters, such as ://).
A better way is to store your data in a backend database, and to wrap the appropriate HTML tags around the data when accessing the data. Another alternative would be to make use of server-side scripting (e.g. ASP on a Windows NT platform or PHP on a Unix platform).
Once you have generated links to the Web site services, you will need to manually follow the links to obtain the results and then store them.
It would be possible to write a script which initiated the requests, and processed the results for you (this is sometimes known as "HTML-scraping"). However there is a danger that automated submission of requests to a service which has been developed for use by humans could degrade the performance of the service, and even, in extreme circumstances, result in a 'denial of service' attack. You should not use this approach unless you have obtained permission from the service provider.
Analysing Your Findings
Once you have carried out your survey you will need to analyse your findings. You will normally find that you have a set of numerical values for your community: for example the size of entry points, the number of links to a site, the number of broken links on a site, etc. These can be summarised in a graphical format.
You should note, however, that the findings you obtain may not be correct. Use of the Bobby and NetMechanic tools to analyse file sizes revealed major discrepancies in a number of cases. Further examination revealed that this was due to several factors including: (a) one service measures only images and HTML files while the other also measures external style sheet and JavaScript files; (b) one service will not analyse files if the site has a robots.txt file which requests that robots do not access files in named directories, whereas the other services ignores the robots.txt file.
When summarising the file sizes it was noticed that several appeared to be very small. Further examination revealed that this could be due to several factors: (a) analysis of a redirect message from the server; (b) analysis of a NO-FRAMES message (e.g. a message saying "Your browser does not support frames"); (c) analysis of other error messages.
In light of these findings you should be wary of the results and, in particular, examine any outliers in your findings in more detail.
Other Useful Tools
As well as use of Web sites, there are other tools which can be used to support your survey.
Rolling Demonstrations
University Web managers have how the "rolling demonstration" of University entry points [11], search engines [12] and 404 error pages [13] to be helpful when thinking about the redesign of local facilities. Information on how to do this for your own community is available [14].
Bookmark Managers
You may wish to manage the URLs of your entry points in a bookmark manager. You could use the bookmark facility in your Web browser. However you may find that dedicated bookmark management tools will provide richer functionality. For example, you may wish to receive notification when an entry point no longer exists or the contents of a page changes. Many bookmark managers will provide such functionality (e.g. see [15] [16]).
In addition to desktop bookmark managers there are also a range of Web-based bookmark management tools which you may wish to use (e.g. see [17]). Since Web-based bookmark managers can be accessed by everyone, not only can they be used to help you manage your WebWatch survey, they could also act as part of the survey itself. For example, the WebWatch surveys make use of the LinkBank link management tool. As well as providing email notification when a resource is no longer available, it also provides an automated display of resources [18].
Offline Browsers
You may wish to consider using an offline browser in order to capture Web pages from your community and hold them locally (e.g. see [19] [20]). This would, perhaps, we a way of archiving pages in order to make comparisons at a later date (although there are copyright and legal issues which you will have to consider if you wish to do this).
Disseminating Your Findings
Your survey of the Web sites across your community should be of interest to members of the community. The findings should help them to gauge how they compare with their peers and will encourage those whose Web sites are reported favourably and act as a spur for those whose Web sites did not appear to do so well.
You may wish to make your findings available on the Web. Other options include writing articles based on your findings or giving presentations at appropriate conferences.
You may also find it useful to repeat your survey periodically in order to monitor trends.
You may also find it useful to compare your findings with other communities: for example, how do UK University entry points compare with those in the USA [21]?
In The Future
The approach describe in this article is based on use of freely-available Web sites. However there are a number of limitations to this approach:
- The Web sites are designed for manual use by humans
- The Web sites mentioned in this article were developed for use by humans. The services assume that individuals will enter URLs into a Web form, and will read the results which are displayed. The Web sites may have not been designed for use in the way described in this article.
- Dangers of server overload
- There is a danger that if the technique described in this article is extended to allow automated submission of requests, that the Web site could become overloaded.
- Changes in business models
- The article describes use of freely-available Web services, as this allows end users to reproduce the surveys. However the providers of the Web sites may change their policy of use of their freely-available services. This may mean that surveys cannot be rerun.
- Change control
- The providers of the Web sites may change the internal workings of their form submission pages, which may result in surveys ceasing to work.
- Difficulties in reuse of data
- The output from many of the Web sites is designed for reading using a Web browser and is often not suitable for post-processing.
In the future we should see solutions which will address the limitations of the current approach. The term "Web Services" has been used to describe reusable software components which are designed for use by other applications and can be accessed using standard Web protocols. Use of "Web Services" to provide auditing and benchmarking services about Web sites would appear to address the concerns mentioned above.
References
- WebWatch, UKOLN
http://www.ukoln.ac.uk/web-focus/webwatch/ - Terms of Use: About Bobby, CAST
http://www.cast.org/Bobby/TermsofUse314.cfm - Higher Education Universities and Colleges, HESA
http://www.hesa.ac.uk/links/he_inst.htm - UK HE Institutions, NISS
http://www.niss.ac.uk/sites/he-cis.html - UK Sensitive Maps, University of Wolverhampton
http://www.scit.wlv.ac.uk/ukinfo/uk.map.html - iBerry, iBerry
http://iberry.com/ - Directory of UK Government Offices Web Sites, Tagish
http://www.tagish.co.uk/tagish/links/#univ - UK Higher Education & Research Libraries, Ian Tilsed, University of Exeter
http://www.ex.ac.uk/library/uklibs.html - World List of Departments and Schools of Information Studies, Information Management, Information Systems, etc., Tom Wilson, University of Sheffield
http://informationr.net/wl/ - Dr Dave's UK Pages, Dr Dave
http://uk-pages.net/ukframes4.shtml - UK University Web Entry Points, UKOLN
http://www.ukoln.ac.uk/web-focus/site-rolling-demos/universities/ - Web Tour of UK HE Search Engines, UKOLN
http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/web-tour/ - Web Tour of UK HEI 404 Error Pages, UKOLN
http://www.ukoln.ac.uk/web-focus/site-rolling-demos/university-404-pages/ - Rolling Demonstrations of Web Resources, UKOLN
http://www.ukoln.ac.uk/web-focus/site-rolling-demos/#guidelines - Directory of Bookmark Managers, ZDNet
http://www.zdnet.com/searchiq/siteguides/bookmarkmanagers.html - Bookmark Managers, Open Directory
http://dmoz.org/Computers/Internet/On_the_Web/Web_Applications/Bookmark_Managers/ - Web-Based Bookmark Managers, WebWizards
http://www.webwizards.net/useful/wbbm.htm - WebWatch Links, LinkBank
http://www.linkbank.com/get_links/ariadne/default/13/ - Offline Browsers, TUCOWS
http://www.tucows.com/offline95.html - Offline Browsers, Winappslist
http://www.winappslist.com/internet/offline_browsers.htm - US Universities,
http://www.utexas.edu/world/univ/alpha/
Author Details
Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
BA2 7AY
Email: b.kelly@ukoln.ac.uk
Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath