Planet SOSIG: Asking Questions - The CASS Social Survey Question Bank
The purpose of this article is to introduce The Question Bank contents and situate the resource in the context of its Information Space, that is its relationship to other projects that aim to make social surveys more accessible.
I have the subsidiary aim of using this text to present the choices and decisions that need to be identified, preferably before undertaking the introduction of a medium sized web-based information resource. I aim to be decidedly non-technical, however many of the problems the Question bank team has overcome have been solved because of the increasing flexibility that newer software offers. This article itself breaks the golden rules of writing for the web, it is not concise, scannable or objective. But the recommendations it makes are all in-line with current entreaties to keep the web simple, fast and user-oriented.
Throughout this piece is a not so heavily veiled critique of overly complicated resources that claim for themselves the right to define rules of data management, but rarely deliver consistent or integral data storage or reliable and standardised rules of access. In effect this is a claim for the legitimacy of smaller, more focussed information sources, complementary to the larger, more resource-intensive operations, for resources that aim at being visible and accessible rather than making any great claim to comprehensiveness or authority.
Ideally such niche driven operations should be flexible and responsive, with the ability to alter course faster than the supertankers. This depends on the navigation skills of the crew, but also one needs a chart of the ocean. Adequate and fully functional user appraisal and feedback is the crucial element, and this is a hugely problematic concept in web resource provision where the user is predominantly a taker and not a participant. The opportunity to write this, slightly contentious, introduction for a new audience is very welcome. I have to state that the Question Bank is a resource in development and that any criticism of its structure or content, however harsh, is of course welcome.
Social surveys and data archives
Whether one agrees that quantitative methods are the best way to measure social features of the population or not it is a fact that large scale social surveys have become the major tool for gathering such data and are extremely influential upon government and social policy. Not only has the process of survey construction become highly professionalised, but also the techniques for data gathering, to say nothing of analysis and interpretation, and storage and dissemination, are becoming increasingly complex. There is appearing a perceptible gap between sociologists and survey professionals, and between students of sociology and social policy and those who create, interpret and act upon survey findings.
The theorist - practitioner gulf is exacerbated as much by organisational as by technological change. Downsizing, outsourcing, competitive tendering and the fragmentation of the large government agencies into autonomous business units has meant that independent survey organisations, such as The National Centre for Social Research in the UK, now compete favourably with state departments such as The Office for National Statistics (ONS) for data collection contracts, and even for the design of the major longitudinal national surveys. In such a situation documentation becomes a serious issue. What exactly was asked and in what context? From a more critical sociological perspective, as well as a conscientious professional one, there is often a need to know just what was the context of a given question or set of questions that led to a certain set of data and their various interpretations.
In addition to organisational Balkanisation there is, as there always was, the structural complexity of the survey. With the inclusion of multi-part and sub-group targeted aspects to nearly all the major data gathering exercises there arises a plethora of paper and electronic objects, be they questionnaires, special-interest group supplements, interviewer manuals and notes, or diverse visual material such as showcards or pictures. All the previous applies without even considering the complexities introduced with non-linear telephone and computer based interviewing techniques. For those interested in survey construction, secondary analysis or even survey interpretation to any degree of accuracy, wading through this mire of detail is truly mind-boggling, if and when they actually manage to find any of it.
A requirement of the Education and Social Research Council (ESRC) is that data and documentation generated by any of the projects or surveys that it funds should be lodged at the Data Archive, at the University of Essex. The Government Statistical Service (GSS) and The Office for National Statistics (ONS), along with other survey generating government agencies, including those responsible for the Census of Population every 10 years, tend to deposit their data with the Data Archive too, although they have their own data service, StatbaseTM.
Although the scale and breadth of such archives is impressive, users must often register or even pay for access to information, the search interfaces are often complex with steep and idiosyncratic learning curves, online documents and datasets are large with long download times, and documentation is often patchy and poorly hierarchised or explained. As a consequence of this general inaccessibility third party data providers, such as Manchester Information and Associated Services (MIMAS), have been funded to provide data subsets and selections, together with associated documentation, aimed specifically at academics and sociologists.
Whatever advances the above resources have made available to analysts, they all operate to data-driven agendas. Before 1996 there was scant and uneven attention paid to the documentation of the questionnaires themselves. In recognition the ESRC, in 1995, funded the establishment of The Centre for Applied Social Surveys (CASS). A large component of this virtual organisation was to be an online questionnaire resource, The Question Bank (Qb).
The CASS Question Bank
The Centre for applied Social Surveys was set-up, as a virtual organisation run jointly between three host organisations, The National Centre for Social Research (then SCPR), The University of Southampton and The University of Surrey. CASS is responsible for a popular series of courses on survey methodology, and for the Question Bank.
The Question Bank was introduced expressly to deal with questionnaires and questions and not data or datasets. It is an online resource that is freely available over the World Wide Web without registration or payment. First available in 1996 the resource has over the last six months undergone a radical overhaul of its interface and structure, and there is almost daily change and improvement to the content of the site. Development has not been untroubled, and though this article deals with an overview of the scope and use of the Question Bank, I shall continuous reference to the hard choices necessary in undertaking a project of this nature.
Given the brief to make surveys available online, something which in 1995 no other agency was doing in any systematic way, the directors of the Qb project Roger Thomas and Martin Bulmer had to make decisions for the project scope based on resources. Essentially the balance in such an endeavour boils down to expertise, delivery of material and available labour time, though there are subsidiary concerns that I shall touch upon.
Whole questionnaires
The main imperative was that the resource should consist of whole questionnaires. The reasons for this were threefold:
- Questions should be seen in their original environment and not separated into atomic, decontextualised units (i.e. questions).
In real world surveys questions are asked in blocks, the routing through a question sequence may be complex and depends upon the responses of the interviewee, and data is often derived from combinations of questions after the interview has occurred. The analyst therefore, although they may be interested in only one section of a survey, might need to trace down separate questions and to understand their interconnection. Likewise the student or social researcher might want to see how a sensitive or awkward subject is approached, how the questions lead and follow each other, and what sequence has produced proven and reliable results.
For these reasons material in the Question Bank is reproduced as far as possible in the original format. Aside from 'chunking' the documents to optimise for rapid download over the net all the questionnaires in the Question Bank have been processed without loss of context. Initially this meant representation of paper based survey tools, but more recently interactive programme based questionnaires (CAPI) have become the norm (see later this article). - All other resources in the same field organise and structure their holdings along the lines of indexes.
These indexes are made at either the levels of question, subject or general survey scope (e.g. The Data Archive uses its own indexing system, HASSETT, to which it has devoted enormous resources and time). This means that in searching for content of specific interest the user is limited to the structure pre-set by the indexers of the data base, or by the conceptual complexity and coverage of the index thesaurus. In effect all data or text strings not explicitly indexed are invisible to the enquirer, whether they are using a general web search engine like Infoseek, or are querying the resource holdings through its own quirky interface. (A controversial exception to this rule is the ONS Harmonised Question Scheme that aims to make certain themed question blocks available across a wide range of government surveys, still however only visible if you have a copy of the key document).
The Question Bank is fully searchable, every word of every document being visible through a single search enquiry field. We have selected the MUSCATTM search engine, which can handle natural language enquiries, and through fuzzy logic can find words associated with the search term. Furthermore since large amounts of Question Bank material is in the semi-structured document hierarchy characteristic of a website, rather than hidden away in a proprietary database, it is visible to WWW worms and robots, and is thus regularly indexed by the general search engines. Since these are most people's first choice of information gateway this strategy opens the Qb material to the widest possible of audiences. The results of a search for a named social survey, for example the British Household Panel Survey (BHPS), or the Family Expenditure Survey (FES), using Infoseek, Hotbot, Yahoo or Northern Light or similar popular tools, is far more likely to return a Question Bank document than pages of any of the major archives, or even the websites of the government agencies responsible for sponsoring the surveys. - Important surveys produce datasets that are often re-analysed.
Until recently no one has systematically stored the documentation which actually generated those data sets. A driving force in the founding of the Question Bank was an attempt to rescue significant examples of such key questionnaires and their documentation. This has necessitated an extremely laborious process of conversion from paper to electronic format. Technically, for a document to be fully searchable, each and every word must be visible to the search engine. This is a relatively minor problem where an electronic file is available, it can be converted to a web distributable format by any of a growing number of conversion tools. (In practice it should be noted that the choice is rather more limited than it might seem). For paper however each word must be proof read to ensure that it has been correctly scanned, converted and represented in the final document.
In the Question Bank this conversion step has formed the bulk of the work. In contrast to many other resources where a mass (read automatic) conversion process is undertaken, with its concomitant increase in the percentage of conversion error, we have chosen a process of 'cherry-picking' key documents and ensuring that they are fully machine searchable. Given prior access to most electronic documentation, The Data Archive amongst others, has now begun to publish questionnaires to the web. Usually however the survey questions form only sections of large 'documentation' packages that might take minutes and minutes to download before the user is able to open them, look, and find that they did not contain the required material. In one strategy the Question Bank might be used as a scanning tool to find interesting microdata at the question level, and then follow up the enquiry in the more extensive holdings of the larger resources.
The Question Bank tries, wherever possible, to send users directly from any questionnaire to its dataset, held elsewhere. Such attempts at helpful cross-linking are generally confounded by the fact that relevant documents are stored in databases that cannot be directly hyperlinked at present, since databases can only generate web pages 'on the fly' as the result of a specific enquiry. Instead the most specific page to which the user can be redirected is often only at the very highest level (e.g. BIRON). MIMAS is an exception in this respect in that each survey has its own HTML root documents or web pages. Document and Resource Format and Structure
The Question bank is a website. This means it sees itself as part of the Internet, rather than simply using the Internet as a delivery system. The team tries to be conscious of this at all times and it was to a web constituency, using standardised web technology, and coping with familiar web based problems, that the initial decisions about document format were directed.
Document Format
Essentially all material in the Question bank falls into one of two categories:
- Questionnaires
In the Question Bank these long and complex documents form the bulk of converted objects and any strategy of representation must reflect this in an efficient and robust manner.
Initially many documents existed as paper only, with precise and proven layout conventions, and often designed for ease of interviewer use as much as with the interviewee in mind. Reproduction needs to maintain layout, produce a small, easily transmissible document, whilst maintaining searchability to the level of the individual word or phrase. Furthermore these documents need to be visible and searchable on a wide variety of software configurations, all conceivable combinations of platform and browser. Given also that the resource is aimed primarily at academics, students and those working in large institutions with dinosaur IT departments and budget committees, it was understood that 'cutting-edge' technology was unlikely to be encountered, and hence that Question Bank documents should not be too innovative in format or structure. Network administrators are also loath to disturb the fragile equilibrium of their systems. Any downloadable plug-ins, system extensions or applications are unlikely to be available to users in such large organisations.
Because conservation of document layout and low download size (kB) across a broad platform base were the key initial criteria the decision was taken to use Adobe Acrobat software to convert the questionnaires into Portable Document Format (PDF). A PDF file is produced from Postscript, theoretically one of the only constants in the world of computing, since it is the language that most operating systems use to talk to printing machines. The decision to use this format has proven a very good one. More and more this is the format used when a web designer wants to provide a document based on a print original and where layout and typography must be preserved. Government publications and technical manuals are increasingly published primarily rather than secondarily in this file format.
A plug-in is needed to view PDF's, but given its general utility and free availability it is normally loaded on most networks as a standard. PDF's are small, fast to download, stable, manipulable, and common. They are also searchable, within the application (PDF Viewer) or from outside the document with the right search engine. The Question Bank search engine MUSCAT was chosen explicitly for its ability to see into PDF's (it can also look into Microsoft Word, Excel and PowerPoint documents as well as of course HTML or web pages). General search engines are rarely configured to search the contents of non-HTML documents (many look no further even than the document TITLE and META attributes) but with PDF use in exponential increase this may be a service that one or more of the general engines might begin to offer soon.
The largest mass of Qb documents were converted to PDF format from paper using Acrobat Capture, an archive optimised Optical Character Recognition (OCR) package (not to be confused with the lower quality option of Adobe Acrobat). This method entails a proof reading step that most resource providers omit. Capture will produce an eye-readable document from a scanned image in seconds. At the machine-readable level however all is not as it seems. The view of a page will contain fully converted words, those recognised by the OCR algorithms, in combination with images of words where the OCR package has not recognised characters that were badly scanned, smudges, small, non-left to right in orientation, etc. Unless these images of words are manually replaced, by comparison with the original, the document may be left with upwards of 30% non machine-readable content. Large archives rarely take the time to perform this arduous and time-costly exercise.
Increasingly however, documents are available to the archivist in electronic format, often as last draught word or text files (NB rarely as the post-proofing version that the typesetter and author finally signed-off to the printer of a document, the definitive final public or end-user version). Conversion to PDF or HTML is easier in this case and it is more a case of which package to use than whether it is worth the effort or not. Or so it might seem. If the criteria of maximum cross-platform viewability and full searchability are rigorously applied the conversion software of choice shrinks rapidly.
Simply converting to HTML (web page) using basic conversion programs leads to one of two consequences:
- Non standard and 'dirty' HTML which can be viewed only on certain platform-browser combinations (Microsoft Word and FrontPage please stand up)
- Change or loss of layout in the HTML output, a disaster for most questionnaires. As windows are resized web page contents flow to fit the new area, so layout is never an element that can be controlled in an HTML document. The most recent conversion packages claim to have overcome these problems but only by recourse to Dynamic HTML (DHTML) Layers and/or JavaScript elements, technologies with notoriously different implementation on Netscape Navigator and Microsoft Internet Explorer browsers, and thus effectively barred from use by the web developer who wishes to reach a large audience.
Using Adobe Acrobat Distiller most electronic documents can be converted to PDF with their initial format, layout, fonts, and graphic elements intact (relatively), and often at one-tenth the size of the original document. Such documents load faster, and are more cross-platform than HTML equivalents or even than the originals. For consistency and reliability, PDF is the format the Question Bank team chose for nearly all questionnaire material. Again it is possible to use simple non-labour intensive methods to produce PDF's from electronic documents, such as the PDF Writer plug-in for MS Word. Acrobat Distiller however is the only way to ensure that the contents are faithfully reproduced at maximum compression (smallest size) and with full searchability. The Question Bank optimises AND splits the finished PDF file so that few documents should take longer than 15 seconds (at 28.8 kB psec) to download. Splitting and cross-linking documents is a MANUAL task which few archives spare the time to perform.
- Infrastructure
All non-questionnaire material forming the structure that contains, discusses, orders, and indicates the location and scope of the Qb questionnaires.
In the Question Bank all of this material is in HTML format. But it is not enough to say 'put it into HTML'. Again tough decisions are necessary at this stage. All coding in the Question Bank aims to comply with the World Wide Web Consortium (W3C® ) recommendations for HTML-4, Cascading Style Sheets (CSS) and the JavaScript Document Object Model. I know its fashionable to trash 'big bad Bill' but Internet Explorers 4 and 5 are the ONLY web browsers that reach these standards.
For some obscure reason IT administrators all load up Netscape Navigator, which in its most recent manifestation, Navigator 4.6 is rubbish on HTML4, worse on CSS, has its own peculiar manifestation of the DOM and crashes the operating system every time you try to click anything too fast. It remains to be seen whether Navigator 5 will make any concessions to the confused and riled general web using community, but until that time we can only recommend using IE4 as the browser of choice to view Qb documents. Unless of course you use a Mac where IE4 is worse than Netscape 3 (a virtual 1995 situation).
In preparing the Question Bank infrastructure it was necessary to conclude that although more than half of all browsers are now IE3 or above, in our constituency of students, researchers and survey professionals a generally unrepresentative majority will still be stuck with version 3 browsers, predominantly the four year old Netscape 3, or even older versions (I saw Netscape 1 on a colleagues laptop the other day). This may be compounded by low resolution monitors and long average download times due to slower software on operating systems as old as Windows 3.11. Users might even have highly restricted Internet access as is often the case in organisations that have yet to realise that exclusion from information is no longer a viable economic strategy.
Vexing as all this might be to the web developer it can be overcome with thoughtful use of colour (which comes free), diagrammatic representation (images can get large though), flat not deep structure (everything only three clicks from the home page), liberal cross-linking (in recognition that users do not read but scan and download), elegant degradation (writing HTML pages that still look OK in successively lower browser versions), and the use of scripting to route users to alternative interfaces depending on their machinery. The Question Bank uses all of these techniques in an attempt to make material as attractive and navigable as possible. Feedback please?
Finally the Question Bank uses frames. This bucks all recent trends. Since the main content of the Question Bank is in PDF format users might rapidly lose their place in the web site because PDF's, which might also take a while to download, will fill the browser window unless a navigation frame is added around the document loading frame. A necessary evil. Links to the Question Bank Information Space
Supposedly the confidence of a website is measured in its willingness to send viewers off to another place. This is the raison d'être of well-conceived gateways.
There are moves to try and delineate structure from the morass of the WWW, even attempts to build intelligent search agents able to rank the worth of a resource automatically using a mechanical process analogous to that used in human-mediated ranking (e.g. Yahoo and Alta Vista are really sets of lists collected and grouped by their content and utility). Such schemas would divide useful resources into one of two types:
- A Hub
A web page that links to many authorities - An Authority
A web page that is pointed to by many hubs
If link density is to be an important criterion for visibility on the Internet the designer must decide how many, AND EXACTLY WHICH, links to collect in their website. Will users click and rush off, never to return? Will they understand that they have left at all?
The Question Bank team does not pretend to be producing a Gateway but we repeatedly discover some new resource or centre that we knew nothing about. It is not uncommon whilst teaching or demonstrating to find that the structure of the survey Information Space (meaning a kind of sub-domain of the WWW) is poorly understood. Furthermore repeatedly we are asked if we archive datasets.
Clearly the aim of the Question Bank is to become an 'authority' but since so many users ask to be redirected there is an inescapable trend toward becoming a 'hub' as well. The decision was taken early that this is almost a separate aspect of web resource provision and the Question Bank employs a part-time researcher at the University of Surrey to collate hyperlinks for us. Currently the results of her work are seen in the links and bibliography sections of each of the Qb topic regions.
We are also constructing an InfoSpace section in Qb the site. This is a diagrammatic representation of the data archives, government agencies, survey associated sites (some have their own website) and related research and co-ordination bodies that concern themselves with survey design and construction. The Question Bank is very easy to step out of.
On a technical level, if you are in framed site do you allow a hyperlink to replace your site in the browser window, or do you launch the remote website in a separate, new window. Current feeling concurs that confidence will out, and that you should always launch into THE SAME window, allowing users to carry on using the back button which gives them control over their mental map of their individual session history. This is generally the course chosen in the Question Bank.
Finding material in the Question Bank
Finding stuff is what it's all about on the web. Other than making as much material visible to external robots (general search engines) any site of any size really needs its own search engine. Important too is a structure and site design of minimum necessary complexity. Although many websites are very small their interfaces are often very busy. With a larger resource a more sedate approach is essential and here the accepted standards of the web, the general user paradigm, is best followed.
In the Qb technological frills have been kept to a minimum, and where clever stuff does exists, it is generally completely invisible to the user. As mentioned the largest single aid in finding material is actually the frameset which contains buttons linking to all main areas of the site, visible at all times. A drawback is that no individual document can be simply bookmarked (but try IE5, it does it!).
The Structure of the site is arranged around the unit of interest, The Survey, and the unit of description, The Topic, with subsidiary, or complementary areas such as the CAPI section, the author list, help pages, contact forms, etc. clearly indicated as autonomous units. Beneath this apparent structuration however there is multiple cross-linking and as much sharing of material as possible.
The Question Bank has been designed to help users by offering multiple routes and methods to find relevant material. There are however three main search strategies:
- Survey menu
Some Question Bank users will come looking for material that they believe has been included in particular surveys known to them. For example, they may recall that a particular topic was included in the British Household Panel Survey in a particular year and wish immediately to access and perhaps browse through the questionnaires for that year. Their needs are accommodated by a menu system that lists source questionnaires by survey and year. In most cases a sub-index is provided for the longer questionnaires to help narrow down the search. - Topic menu
Many users will approach with a broader perspective, knowing that they are interested in a particular topic area, say ‘ethnicity’, and wanting to review the relevant concepts and variables and the measurement approaches which have been used on major benchmark surveys. Then, having read something of the conceptual background and the methodological problems of the topic area, they may wish to proceed via menus or hypertext links to look at the way the topic has been tackled in practice in questionnaires.
Such users can access the Question Bank ‘top-down’ via the topic list menu, which in turn leads on via hypertext links to commentary on various aspects of the topic area, or to survey examples. The topic list is also a quick and general way for users to check whether the general topic area in which they are interested is covered by the Question Bank. Users can then reach relevant survey material either through installed hypertext links or by searching using MUSCAT. - MUSCAT search engine
The topic list and the questionnaire indexes provide a quick, but inevitably rather rough and ready, means of identifying whether and where broad topics are covered by the Question Bank. A more focused and powerful method of searching is to use an intelligent search engine that indexes the entire contents of the Question Bank. After considerable investigation and testing we have selected the MUSCAT ™ search engine, which can scrutinise both HTML and PDF files. Search terms can be entered as free text including key words, and MUSCAT will search the entire content of the Question Bank for those terms, producing a list ranked in terms of relevance. MUSCAT uses probabilistic retrieval, allowing users to type, just as they would speak, a natural sentence or phrase expressing their interest. Feedback Mechanisms
As stated earlier feedback is the single most crucial tool in website development. This has been the second least impressive area of performance in our record (the worst being the commissioning of topic material for the Question Bank, see later this article). Although there are multiple methods for online feedback from the Qb site response is very poor, and this even though the access logs indicate an impressive hit rate. It would seem that the web user is essentially a downloading machine, that invitations to contribute are almost always ignored, and that the provider of any resource would do well to build face-to-face feedback opportunities into any development program.
The Question Bank resource is under continuous and intensive development. We invite queries, complaints, suggestions for inclusion, or any other form of feedback, positive or negative. We have constructed several ways for users to comment on the site or its content, from quick observations to a considered critique:
- Suggestion Box - A free form text box - always visible in the frameset
- Quick Comment - Quickly tell us what you were looking for and whether you found it
- Feedback Form - Provide us with a longer and more structured example of your views
- cassqb@natcen.ac.uk - email us with any comments or suggestions
- a.guy@natcen.ac.uk - email Adam Guy for technical or navigation guidance
- Write to - The National Centre for Social Research, 35 Northampton Square, London EC1V OAX.
- CAPI documentation
The popular image of the social survey interviewer is that of someone carrying a clipboard or folder with a paper and pencil (PAPI) questionnaire that they complete in writing. For most of the surveys listed in the Survey section of the Question Bank, however, this has been replaced during the last decade by the interviewer carrying a portable computer on which the questionnaire resides as a program for Computer Assisted Personal Interviewing (CAPI). This represents a major technical advance in the survey process, but also poses a challenge to the professional survey researcher to make the CAPI interview intelligible to the lay person.
The Question Bank makes available on our site the version of the survey questionnaire published by the survey organisation producing the survey. Sometimes this is quite similar in appearance to a paper questionnaire. In the case of more complex surveys, for example the Family Resources Survey or the Health Survey for England, it bears less resemblance to a paper questionnaire, and has alternating sections, showing respectively the actual question wording and the routing followed through the questionnaire for different respondents according to the way in which they have answered earlier questions.
Both the National Centre with support from the ESRC Research Programme into the Analysis of Large and Complex Datasets (ALCDS) and ONS (TADEQ) are attempting to build software tools that generate and represent CAPI documentation in ways that users of different experience levels and with different requirements can use.
The Question Bank team is grappling with problems of making the way CAPI surveys work in the field as transparent as possible to Question Bank users, and is writing material to explain some of the features of these new styles of questionnaire for our site. This will become an increasingly important issue for the survey researcher as we move into the twenty-first century.
Metadata - Data about data
The proliferation of Internet repositories, each with its own navigational, location, storage and indexing systems means that the user often has to learn new skills each time a resource is discovered. Finding relevant material can become extremely time-consuming, institutional resources are wasted in duplication and crucial data is often hidden within a plethora of proprietary databases, accessible only through idiosyncratic gateways and invisible to general web searching tools.
In recognition of this problem many incentives are under way to make data and documents visible across a variety of processes, gateways and search tools. Metadata, data about data, are tags, descriptions or indexes attached to resource elements and are intended to unify or standardise the key attributes of information objects. Metadata conventions will allow researchers using diverse tools to classify, store, access and retrieve key information using the web as the common platform. Competition is intense and cross-competing claims to have produced the standard system are many.
Of current interest are the NESSTAR project which aims to enable users to search for relevant data across several countries in one action, and the Dublin Core Metadata Initiative which recommends a 15-element metadata set for describing Web resources. The World Wide Web Consortium (W3C®) are promoting the adoption of Extensible Mark-up Language (XML) to enable archivists to 'wrap' documents in metadata envelopes, thus improving their visibility to the next wave of XML capable search engines. The W3C are also involved in the development of the Resource Description Framework (RDF), a foundation for processing metadata that will allow machines (worms and intelligent agents) to 'understand' rather than just 'read' documents.
The Question Bank team is actively researching these developments and is in contact with those involved in key incentives in the fields of social surveys and sociological resource provision. Documents held in the Question Bank are consistently labelled, and since the resource is semi-structured and visible on the web, rather than hidden within a database architecture, could be easily assimilated by any, or several, of these schemes.Conceptual Structure
Users
In developing the Question Bank we primarily aim to help:
- Researchers devising their own survey questionnaires, by providing easily-accessed illustrations of how the topics with which they are grappling have been handled/measured in professionally designed surveys.
- Secondary analysts of survey data, either at the stage where they are seeking out surveys containing material of interest to them, or at the stage when, having worked with particular survey data sets, they wish to learn more about the underlying survey processes and their likely strengths, weaknesses and limitations.
- Teachers and students of survey methods by providing text and examples on the wording of questions and the construction of questionnaires, and online presentations and exercises for use in workshops.
Indexing
I have already said a lot about how indexing can lead to an inflexible, brittle and user-unfriendly organisation schema. It is also a load of work. Users want lists though, and in recognition of this we have begun to develop keyword indexes and to extract question examples for the topic or other subject led pages. This is a slow process and essentially relies upon the contribution and time of experts, a rare commodity.
Surveys
Research teams drawn from professional survey organisations and survey sponsoring organisations have created the Questionnaire facsimiles available in the Question Bank. Wherever possible, the details of the responsible survey organisation are given. These organisations support the idea that other researchers should be able to copy questions and use them in their own surveys. Material in the Question Bank is reproduced by special permission of the copyright holders of published documents in which the material appeared.
Question Bank staff have taken great care to reproduce the questionnaire instruments accurately, but the originators of the questionnaires are the authoritative source of knowledge about the questions and their development. Neither the originators of the questions nor the Centre for Applied Social Surveys can take any responsibility for use of the questions by others, or for providing advice to individuals on the design and use of questions.
The Question Bank contains questionnaire examples from the following surveys or survey groups:- Links to the British Crime Survey website
- British Election Survey
- British Household Panel Survey
- British Social Attitudes Survey
- 1991 UK Census of Population
- English and Scottish Church Censuses
- European Household Panel Survey
- Family Expenditure Survey
- Family Resources Survey
- Family and Working Lives Survey
- The Fourth National Study of Ethnic Minorities
- General Household Survey
- Health Survey for England
- Housing Attitudes Survey
- Labour Force Survey
- National Child Development Survey
- National Survey of Sexual Attitudes and Lifestyles
- National Survey of NHS Patients
- National Survey of Voluntary Activity
- National Travel Survey
- People, Jobs and Recession (from 1984)
- Survey of Activity and Health
- Survey of English Housing
- Women and Employment Survey
- Workplace Employee Relations Survey
Coming soon are:
- British Election Panel Survey
- Dietary and Nutrition Survey of British Adults
- Scottish, Welsh (also in welsh) & Northern Ireland Referendum Studies
- Scottish and Welsh (also in welsh) Election Studies
- Welsh Health Survey (also in welsh)
For continuous surveys the aim is to hold copies of versions of the survey fielded since 1991 and, for both continuous and one-off surveys, to display all questionnaires (e.g. household, individual and proxy schedules) for the appropriate year. Surveys are being added to the Question Bank continuously. A number of academic surveys with national coverage on particular topics will be added during 1999.
If you would like to nominate a survey to add, please email the Question Bank at:
cassqb@natcen.ac.uk.
Survey inclusion
The criteria we have used in selecting surveys are:
- UK sample coverage
Ideally a Question Bank might aim to cover all suitable surveys conducted in countries sharing a common language and culture. If we take English-speaking countries as our standard, that would bring into scope a very large number of excellent surveys conducted in North America, in Australia and so on. Unfortunately, the resources of the CASS Question Bank are not sufficient for us to aim for such broad coverage and we have confined ourselves to surveys conducted in the United Kingdom. - Social surveys only
Surveys for which questionnaires are included in the Question Bank are all social surveys. Commercial market research surveys and business surveys directed to organisations are not in general included in the Question Bank. In most cases the population units which the surveys are intended to study are either individual persons, or domestic groups such as households or families. Surveys deal with a very wide range of topics that relate to the circumstances, behaviour and attitudes of these units. - Mainly large scale surveys
The Question Bank focuses mainly on large-scale quantitative surveys. Most of them have quite long and complex questionnaires, administered in the field, or over the telephone, by trained social survey interviewers. High proportions of the inclusions are conducted either by, or for, central government departments. Others are major academic surveys. Many are repeated, continuous or longitudinal surveys that produce an annual series of published results. - Benchmark surveys
Another main reason for selecting the questionnaires of particular surveys for inclusion in the Question Bank are that these surveys on a national scale are generally treated as benchmarks against which other surveys in the same topic areas can be compared.
The criteria for selecting surveys as benchmarks are:
- that the survey should have been professionally developed and conducted to a high technical standard
- that it should cover a national reference population
- that it should be a prime current source of information on the important social science topics that the Question Bank sets out to cover.
- Conducted mainly since 1991
The Question Bank aims to keep up with the constantly increasing tempo of new questionnaire instruments coming on stream, and the release of survey datasets to the Data Archive. Retrospectively, we decided to try to cover the period from 1991 (a Census year) onwards, but not to attempt systematic coverage of the period before 1991. However, a number of surveys conducted before 1991 are still used as benchmarks, or exemplify particular innovations in question or data collection design. For these we have made exceptions to our rule, so that the Question Bank contains questionnaires for selected surveys conducted during the nineteen eighties. - Professionally designed
Within the field of UK surveys conducted over the past 20 years or so, the questionnaires included in the Question Bank have been chosen partly because all have been developed by leading professional survey organisations operating in Great Britain. That usually means that the question developers have had considerable survey experience to draw on. - Quality assured
Given their origin, it can be assumed that the questions reproduced in the Question Bank have also been pilot tested, to make sure that they seem to work satisfactorily for the population of British respondents at whom they are aimed. Such piloting will normally have taken the form of rehearsal-type field tests, plus scrutiny of the data yielded by the questions for omissions and anomalies.
Such testing weeds out, for example:
- questions which respondents may find baffling because of the concepts, syntax or vocabulary used
- questions which some respondents are unwilling or unable to answer
- questions which elicit obviously irrelevant or inconsistent answers
- questions which fail to capture part of their intended universe of content
- questions which produce unduly skewed distributions of responses.
For some questions special tests have been done, over and above standard piloting, to check that different respondents understand them in the same way and that the answers obtained are sufficiently valid, accurate and statistically reliable for their purpose. Such tests may involve, for example, cognitive testing of the way in which the questions are answered and controlled empirical comparison of the results obtained by different question forms or special validity checks. These methods require extra time, trouble and expense and are the exception, rather than the rule. Nevertheless the questions used in the Census of Population, for example, have been subject to very extensive formal testing of this kind and there have been similar question testing and evaluation programmes in certain other areas. The Question Bank contains some references to the results of question testing, where available.
Questions contained in the Question Bank are likely, on the whole, to perform better as means of collecting quantitative information for particular purposes than questions which someone coming fresh to a survey topic, without previous question drafting experience, might invent for themselves. However, there can be no such thing as the ideal question on a given topic for every application, only questions which are good relative to the purpose for which they were intended and within the constraints of a particular data collection situation. Users should read the commentary on the various approaches to questionnaire design exemplified in the example questionnaires.
- Harmonised question forms
Some topics recur in many different social surveys in the Question Bank. They include, for example: demographic information, questions to establish household structure and housing circumstances, economic activity topics, income topics and so on. Since the questions devoted to these topics typically provide the analysis framework within which other topics are analysed it is particularly important to users of survey data that the framework be standardised, so that the results of different surveys can be compared and aggregated.
In 1995 in recognition of this the Office for National Statistics (ONS), which runs various important government surveys included in the Question Bank (e.g. Labour Force Survey, Family Expenditure Survey, General Household Survey), initiated a programme of work and negotiation with survey sponsors which aimed to arrive at a set of question wordings harmonised across surveys. The result was a controversial booklet entitled Harmonised Questions for Government Social Surveys, which sets out the harmonised forms. These have now been, or are in process of being, adopted in all major continuous government social surveys.
By agreement with ONS the Question Bank contains the full text of this booklet, as well as of the two subsequent updates (1996, 1997), and a quick reference table that indicates which question blocks were used in which years. It will be useful to all questionnaire designers who wish to make their own surveys comparable to major standard government surveys covering the same topics, particularly since the question forms are arranged in a concise way under topics.
The ONS Harmonised Questions website is the authoritative channel for the latest information on this process. Topics
The area in which the Question Bank or any medium sized resource should be strongest is in its meta-narrative. Material that elucidates the purpose and theoretical underpinning of a resource, including extensive illustration by example, is the bedrock of good website design. It is precisely to this type of online writing that the web, with its ability to hyperlink, most lends itself. The topic area has been the hardest section of the Question Bank to populate.
Reasons are threefold:
- RAE
Web disseminated articles are not generally refereed, they are therefore inadmissible for the Research Assessment Exercise, which measures and ranks the performance of non-independent academic institutes. There are no 'brownie points' to be gained by academics in writing for the web. - Time
Although the Question Bank is based (physically) in centres of excellence in survey design, the commercial imperatives touched upon in the first section mean that the immense time investment needed to pick over the Qb material and write discursive essays has often been prohibitive in our approaches for submissions. - Writing for the web
This is a new art. Long convoluted and wordy documents (like this one) cannot be easily read onscreen. Users also approach browsing in a 'grab and get' frame of mind. They simply will not read carefully. Articles and explanatory material then needs to be bite-sized, repetitive and heavily hyper-linked. Writing like this is something academics and specialists in their field are very unhappy about doing.
Those planning on creating a web resource should think carefully about where material will come from and even more critically about WHO will do the work to supply copy with which to populate the architectures in their minds.
In principle, the Question Bank aims to cover all topics of interest to social science that can be studied using the standardised quantitative social survey method. The potential range is very wide. In order to give some structure and make it easier for Question Bank users to find what they are looking for, we have provided a broad listing of 21 topic areas.
Currently 14 areas contain diverse material, often links to related websites outside the Qb, essays on key variable definition, or bibliographic lists:
- Crime and victimisation
- Demography
- Economic activity
- Education, qualification & training
- Ethnicity and race
- Family
- Gender
- Health, illness and disability
- Housing and household amenities
- Household definition and structure
- Income, expenditure and wealth
- Leisure and lifestyles
- Political behaviour and attitudes
- Religiosity
Soon to be linked are 7 more topics:
- Geography
- Social Attitudes in general
- Social class
- Social protection and care
- Travel and transport
- Voluntary associations
- Working life
Topic commentary
In addition to the questionnaire material and the excerpts from other published documents bearing on particular topics, the Question Bank aims to contain specially written critical and explanatory commentary. The commentary is intended to help users to understand the conceptual structure of each topic area and the way it is reflected in the structuring of questionnaires. It may also make users aware of other concepts and questioning approaches that may be closely related to the ones that they had in mind when accessing the Question Bank. The commentary includes discussion of any available objective evidence on the validity, reliability etc of the measures produced by questions used.
For each topic area, we aim to provide a summary account of the main concepts involved and of current approaches to measuring them using survey questionnaire methods. We aim, where possible, to directly link the commentary to relevant examples of sections of questionnaires in the Question Bank surveys. In addition, we provide bibliographic references to relevant social science research literature on the topic, and to other Internet sites containing salient information.
An editorial board has been set up to monitor the quality of material in Question Bank, and to commission commentary from experts in their subject. The members of the Editorial Board are:
- Martin Bulmer (Chair, CASS)
- Angela Dale (Director of the Cathie Marsh Centre for Census and Survey Research, University of Manchester)
- Peter Halfpenny (Director of the Centre for Applied Social Research, University of Manchester)
- Jean Martin (Head of the Survey Methodology Unit, Office for National Statistics)
- Roger Thomas (Director, CASS).
Experts in particular topics are currently writing and reviewing material for the Question Bank. A new policy decision is that all material written for the ‘topics and areas’ will be attributed to a named individual, so that the source of material may be clearly identified. This will further the aim of creating within the Question Bank site an electronic encyclopaedia about survey research methodology.
Authors can be found through a list on the website, and their articles accessed directly from there.
Using the Question Bank
I don't intend to go into long descriptions about how to actually use the Qb site in this article. There are plenty of help pages on the site itself. I would like to point out the growing teaching area which makes available presentations that we have used to explain the purpose and use of the site and contains downloadable full-colour help-sheets and exercises. The aim is that teachers of research methods or survey methodology could integrate use of the Question Bank into their course schedule. As usual feedback would be nice.
Qb Staff
Roger Thomas is the Director of CASS. He is a senior member of the Survey Methods Centre at the National Centre for Social Research.
Martin Bulmer is the Academic Director of the Question Bank. He is Foundation Fund Professor of Sociology at the University of Surrey and Associate Director of its Institute of Social Research.
Adam Guy is Manager of the CASS Question Bank in the Survey Methods Centre, The National Centre for Social Research.
Tom Johnson is responsible for the quality of material in the Question Bank and he also works in the Survey Methods Centre, The National Centre for Social Research.
Christina Silver is a researcher in the Department of Sociology at the University of Surrey and has been responsible for finding material related to Qb topics on the Internet.
Stuart Peters, based in the Department of Sociology at the University of Surrey, produced Sociological Research Online for three years and is now involved in EPRESS, a project aimed at helping others to publish journals online. He helps maintain the Question Bank search engine.
Author Details
Before working on the Question Bank Adam managed the database, project led the website team and introduced an Intranet for a management consultancy. He holds an MSc in Social Anthropology from University College London. He has one daughter and another on the way.
Author Details
|
The CASS Web address (URL):
http://www.natcen.ac.uk/cass/