The W3C Technical Architecture Group

henry s. thompson

The W3C Technical Architecture Group

Henry S. Thompson introduces the W3C Technical Architecture Group and its work.

Background: The W3C and Its Process

The World Wide Web Consortium (W3C) was set up by Tim Berners-Lee in 1994 to preserve and enhance the public utility of the Web for everyone, to “lead the Web to its full potential”. It is a consortium of industrial and institutional members (around 450 at the time of writing) who pay on a sliding scale proportional to size. It produces Recommendations which are widely recognised as de facto standards. The actual work of writing those standards is carried out by Working Groups mostly made up of representatives of members, aided by a permanent staff. At the moment there are over fifty active Working Groups, with over 700 members, working on around 100 documents at various stages of their progress towards Recommendation status. The permanent staff numbers around 60, attached to one of the three host institutions: the Massachusetts Institute of Technology, in Cambridge, MA, USA; the European Research Consortium for Informatics and Mathematics, in Sophia Antipolis, France and Keio University, in Tokyo, Japan.

The W3C manages its work according to a formal Process, with an emphasis on consensus and community review, which specifies a progression from Working Draft through Candidate Recommendation and Proposed Recommendation, before the Director (currently Tim Berners-Lee) seeks formal reviews from the membership and either approves publication as an official W3C Recommendation, or returns it to the Working Group for further work.

One of the responsibilities of the Director is to consider the architectural impact of Working Groups’ output, particularly of Proposed Recommendations. As the consortium grew, and the scope of its work expanded, it became increasingly difficult for one person to bear the responsibility for articulating ‘the architecture’. Working Groups needed a concrete expression of what came to be called “Web Architecture”, to which they and others could refer as the basis for planning and decision making. In 2001 the membership agreed to create a Technical Architecture Group (TAG), to take on the task of identifying and documenting the architecture of the World Wide Web.

TAG Makeup and Remit

The TAG has nine members: the Director ex officio and eight others who serve two- year terms. Of these nine, three are appointed by the Director and five are elected by the W3C membership (although they need not be associated with the W3C themselves). Although the Director is nominally the Chair, in practice he delegates this responsibility to one of the appointees.

The following photograph shows the current TAG membership, with the exception of Dave Orchard of BEA Systems:

photo (147KB) : Figure 1 : TAG members : Norm Walsh (Sun Microsystems), Rhys Lewis (Volantis), Tim Berners-Lee(W3C), Henry S. Thompson (University of Edinburgh), (Vincent Quint, INRIA, membership term now expired), Noah Mendelsohn (IBM), Dan Connolly (W3C), Stuart Williams (HP), T.V. Raman (Google)

Norm Walsh (Sun Microsystems), Rhys Lewis (Volantis), Tim Berners-Lee (W3C), Henry S. Thompson (University of Edinburgh), (Vincent Quint, INRIA, membership term now expired), Noah Mendelsohn (IBM), Dan Connolly (W3C), Stuart Williams (HP), T.V. Raman (Google) - Photo courtesy of Norm Walsh.

The TAG’s remit is described in the W3C Process document [1] as follows:

“[T]he mission of the TAG is stewardship of the Web architecture. There are three aspects to this mission:

“to document and build consensus around principles of Web architecture and to interpret and clarify these principles when necessary;
“to resolve issues involving general Web architecture brought to the TAG;
“to help coordinate cross-technology architecture developments inside and outside W3C.”

In practice this has meant that a lot of the TAG’s work has been a kind of industrial archaeology: exploring and analysing the ways in which the technologies which comprise the World Wide Web are used and abused, to try to articulate what is important and what is not, what really underpins the success of the Web so far, what is incidental and what actually threatens the success of the Web going forward.

TAG History

The primary focus of the first three years of the TAG was on documenting in a clear and easily understood manner the architectural foundations of the Web. The result was published at the end of 2004 as Architecture of the World Wide Web, Volume One [2] often referred to as ‘WebArch’. It is written in a relatively informal style, with illustrations, and many of its conclusions are expressed in succinct ‘principles’, ‘constraints’ and ‘good practice notes’, such as:

Principle: Global Identifiers Global naming leads to global network effects.

Good practice: Identify with URIs To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources.

Constraint: URIs Identify a Single Resource Assign distinct URIs to distinct resources.

As these examples show, WebArch tries hard to address the basic issues of web architecture clearly and straightforwardly, and as a result it has proved useful not just for the Working Groups of the W3C, but for teachers, students and the general public.

A short note on terminology: The TAG distinguishes three crucial participants in the thing at the heart of the Web, that is, links:

URI

The starting point. The TAG focuses on http: URIs, for example http://weather.example.com/oaxaca.

resource

The end point, which we say is identified by a URI. It can be anything at all.

representation

Something that can be sent in a message, typically from a server to a client, in response to a request.

WebArch includes the following picture of the relationship between these three:

Figure 2. The Oaxaca weather report on the Web

WebArch also distinguishes an important subclass of resources, called information resources, as those resources ‘all of [whose] essential characteristics can be conveyed in a message.’ Most of the URIs we browse, search for and author, identify information resources: Web pages, images, product catalogues, etc., but URIs can also be created for non-information resources, such as:

concepts (http://purl.org/dc/elements/1.1/creator) or
real-world objects (http://www.w3.org/People/Berners-Lee/card#i),

typically in the context of the Semantic Web.

Since the publication of WebArch, the TAG has been in more reactive mode, responding to requests from within and outside W3C to address issues and reconcile competing practices. Some of the issues which have been raised and addressed, usually by publishing short documents known as ‘findings’, are listed below:

Is an XML namespace a small set of qualified names, such that it makes sense to talk about adding to a namespace, or is it an infinite set of qualified names, only a few of which have definitions at any given time? The TAG ratified the latter view: The Disposition of Names in an XML Namespace [3].
Is the metadata in a message, for example the “Content-Type” header, definitive, or advisory? The TAG confirmed that it is definitive, and explained why: Authoritative Metadata [4].
If a URI does not identify an information resource, that is, one which is pretty completely represented by a message (for example, a message consisting of an HTML document), but rather identifies something like Beethoven, or the Eiffel Tower, about which a server has some meta-information, perhaps in the form of RDF, is it OK to supply that information if someone tries to retrieve from the URI? The TAG said “Yes, but not with a 200 response code – use 303 instead, to make the difference clear”: [httpRange-14] Resolved[5].
Often resource owners offer more than one representation of a resource – different formats, different languages, etc. Both human and machine consumers of such resources need help understanding the relationships between them. The TAG offers recommendations about link patterns and metadata for Linking Alternative Representations To Enable Discovery And Publishing [6].
Although strictly speaking URIs are opaque, and there is no required relationship between the structure of a set of URIs and the relationships of the resources they identify, in practice resource owners who publish representations of related resources use related URIs to identify them. The TAG has issued guidelines on The use of Metadata in URIs [7].

Current TAG Concerns

The TAG is currently engaged with a number of issues. In some cases draft findings are available, in others things are still at the preliminary fact-finding and discussion stage. The following sections give brief summaries of these issues and where the TAG is in its consideration of them.

Versioning

The TAG has been working on a number of aspects of the complex problem of versioning and extensibility for formally defined languages in general, and XML languages in particular, for over three years. The work has been both analytical – trying to pin down what the language-evolution aspects of HTML have been and to give clear and well-grounded definitions for the relevant terminology – and proactive, trying to identify and recommend good practice both for the schema languages which are used to define languages, and for the languages themselves.

The work is currently expressed in two draft findings:

The first, Extending and Versioning Languages Part 1 [8], is concerned with versioning as a whole, and with the definition of relevant terms.
The second, Extending and Versioning Languages: XML Languages [9], is concerned with versioning XML languages in particular, and with techniques for using W3C XML Schema to support different versioning strategies. A number of case studies are included.

(Almost) Always Use http: URIs

The TAG’s analysis of how the Web works, building on previous work, has identified a few key properties of how http: URIs and the HTTP protocol combine and which are hugely powerful and beneficial. Accordingly the TAG is concerned by the number of new URI schemes (for example info:, xri:, doi:) and URN (sub-)namespaces (for example urn:nzl, urn:ietf:params:xml, urn:oasis:names:tc:ubl) being promoted for use in identifying resources on the Web, because they threaten not only to dilute that value for others, but also fail to deliver the intended benefits to their users.

Accordingly the TAG has undertaken to analyse the technical arguments most often advanced in support of new approaches to naming things on the Web, and, wherever possible, identify the ways in which these arguments misunderstand or misrepresent the properties of http:-based naming. This work, along with several extended examples, is available as a draft finding: URNs, Namespaces and Registries [10].

Passwords in the Clear

That user agents should not send passwords over the Internet in the clear, or trivially encoded, seems obvious; but formulating guidelines for user agents on when and how to warn users that they are at risk of doing so has proved surprisingly difficult. The current state of the TAG’s efforts to express this can be found in a draft finding: Passwords in the Clear [11].

URI Abbreviation

The TAG has recently begun discussions on the potential architectural impact of a proposal to introduce a new means of abbreviation for URIs, known as Compact URIs (or CURIEs, for short) [12]. No conclusions have been reached so far.

The Self-describing Web

One of the key aspects of the Web as the TAG understands it lies in the extent to which it supports, to put it informally, ‘following your nose’ to find things out. This is not just a matter of the way the Web allows a user to click a link to go from one Web page to another, but also in the way Web-accessible resources carry with themselves a kind of audit-trail concerning their own interpretation, via for example media types and namespaces. The phrase ‘self-describing Web’ refers to this part of the Web’s value proposition. One draft finding has been published about one aspect of this, namely the question of what ‘best practice’ should be with respect to XML namespace documents. By this it is meant the information resource, if any, whose representation can be retrieved from an XML namespace URI. The TAG is still working to identify the best combination of current practice, particularly the use of RDDL [13] with the evident potential of the Semantic Web in this area. The most recent draft finding is somewhat out of date: Associating Resources with Namespaces [14], more recent discussion can be found on the www-tag mailing list namespaceDocument-8 background [15].

The Future of (X)HTML

Another area where the TAG is in exploratory mode, where no draft finding has been issued, concerns the architectural background of the recent restructuring of the W3C’s HTML work. The message announcing the TAG’s interest in this area [16] introduced it as follows:

“Is the indefinite persistence of ‘tag soup’ HTML consistent with a sound architecture for the Web? If so, (and the starting assumption is that it is so), what changes, if any, to fundamental Web technologies are necessary to integrate ‘tag soup’ with SGML-valid HTML and well-formed XML?”

(By way of explanation: ‘By “‘tag soup’ HTML” is meant documents which are not well-formed XHTML, or even SGML-valid HTML, but which none-the-less are more-or-less successfully and consistently rendered by some HTML browsers.‘)

The Future?

As well as driving the issues summarised in the preceding section to a resolution, what else is the TAG looking forward to considering? Some topics which we hope to consider in the near future are:

The architecture of the Semantic Web – what is it and how does it relate to the architecture of the ordinary Web: the same, overlapping, or distinct?
To what extent is peer-to-peer Internet usage consistent with the architecture of the Web as the TAG has articulated it? To the extent it is not, is this a cause for alarm or rather, what, if anything, should the TAG consider doing?
The W3C has publicly committed to the notion of “One Web”, encompassing as widely as possible, a range of delivery media. What are the architectural implications of this commitment?
The most frequently asked question of anyone seen as ‘responsible’ for the Web: ‘What are you going to do about spam/phishing/botnets/…’? Can the TAG say anything useful about security and trust for the Web?

Getting Involved

The TAG carries out all its work in public. In particular, there are two public mailing lists where TAG business can be observed:

www-tag@w3.org
High-bandwidth. Discussion of open and potential TAG issues. Announcements of draft findings. Agendas and minutes for weekly telcons and quarterly face-to-face meetings. Open to anyone to subscribe (send ‘subscribe’ to www-tag-request@w3.org) and to post. Publicly-readable archives [17].

public-tag-announce@w3.org
Low bandwidth. Announcements of findings, quarterly summaries of work undertaken. Open to anyone to subscribe (send ‘subscribe’ to public-tag-announce-request@w3.org), but closed to public posting. Publicly-readable archives [18].

The TAG home page [19] has links to a wide range of TAG-related information.

The TAG depends on the whole Web community to review its work and point it in new and fruitful directions – please help!

References

The W3C Team: Technical Architecture Group (TAG) http://www.w3.org/2005/10/Process-20051014/organization.html#TAG
Architecture of the World Wide Web, Volume One, W3C Recommendation 15 December 2004 http://www.w3.org/TR/2004/REC-webarch-20041215/
The Disposition of Names in an XML Namespace, TAG Finding 9 January 2006 http://www.w3.org/2001/tag/doc/namespaceState.html
Authoritative Metadata, TAG Finding 12 April 2006 http://www.w3.org/2001/tag/doc/mime-respect-20060412
[httpRange-14] Resolved: Message from Roy T. Fielding to www-tag@w3.org 8 Jun 2005 http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
On Linking Alternative Representations To Enable Discovery And Publishing, TAG Finding 1 November 2006 http://www.w3.org/2001/tag/doc/alternatives-discovery.html
The use of Metadata in URIs, TAG Finding 2 January 2007 http://www.w3.org/2001/tag/doc/metaDataInURI-31.html
[Editorial Draft] Extending and Versioning Languages Part 1, Draft TAG Finding 26 March 2007 http://www.w3.org/2001/tag/doc/versioning-20070326.html
[Editorial Draft] Extending and Versioning Languages: XML Languages, Draft TAG Finding 26 March 2007 http://www.w3.org/2001/tag/doc/versioning-xml-20070326.html
URNs, Namespaces and Registries, [Editor’s Draft] TAG Finding, 17 August 2006 http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
Passwords in the Clear, [Editor’s Draft] TAG Finding 12 December 2006 http://www.w3.org/2001/tag/doc/passwordsInTheClear-52.html
CURIE Syntax 1.0: A syntax for expressing Compact URIs, W3C Working Draft 7 March 2007 http://www.w3.org/TR/curie/
Resource Directory Description Language (RDDL) http://www.rddl.org/
Associating Resources with Namespaces, Draft TAG Finding 13 December 2005 http://www.w3.org/2001/tag/doc/nsDocuments/
namespaceDocument-8 background: Message from Norman Walsh to www-tag@w3.org on 5 March 2007 http://lists.w3.org/Archives/Public/www-tag/2007Mar/0012.html
Draft description of new TAG issue TagSoupIntegration-54, Message from Henry S. Thompson to www-tag@w3.org on 24 Oct 2006 http://lists.w3.org/Archives/Public/www-tag/2006Oct/0062.html
www-tag@w3.org Mail Archives http://lists.w3.org/Archives/Public/www-tag/
public-tag-announce@w3.org Mail Archives http://lists.w3.org/Archives/Public/public-tag-announce/
Technical Architecture Group (TAG) http://www.w3.org/2001/tag/

Author Details

Henry S. Thompson
Reader
HCRC Language Technology Group
School of Informatics
University of Edinburgh

Email: ht@inf.ed.ac.uk
Web site: http://www.ltg.ed.ac.uk/~ht/

Return to top