What Is a URI?
Users of the Web are familiar with URLs, the Uniform Resource Locators. A URL is a locator for a network accessible resource. Such a locator can be considered an identifier for the resource that it refers to. Depending on the interpretation of identification, various different attributes of a resource could be considered as an identifier for that resource. However, what comprises a functional resource identifier depends upon the context in which that identifier will be used. For example, in a group of five people, identifying individuals by weight is unlikely to be practical. In many situations, we assign a name to an object and use this attribute as the object identifier. Such names also have to be chosen with regard to the context in which it will be used in order to be functional. Back to the example of a group of people, we may refer to a particular person by a combination of their fore and surnames. This name label would probably adequately identify a particular person in a group of five.
The Uniform Resource Identifiers (URIs) are a set of character strings, defined by a generic URI syntax, that are used for identifying resources. A URI provides a simple and extensible means for identifying a resource that can then be used within applications. The URI specification implements the recommendations of various functional recommendations (see further information below).
URIs form a superset of three distinct groups of identifiers, which will be described further on. They are:
- URLs - Uniform Resource Locators
- URNs - Uniform Resource Names
- URCs - Uniform Resource Characteristics
These identifiers, and the generic URI are formally specified in various IETF Working Drafts and RFCs.
What is a resource?
We can consider that a resource is anything to which we can attach identity. A resource arises through a conceptual mapping to an identified entity. Such identity does not necessarily imply network (or other) accessibility. Since the mapping is conceptual, the entity itself may not be constant (e.g. a book being written changes over time) or even instantiated at any given time (e.g. the contents of a noticeboard could be empty).
Some examples of resources are listed below:
- A physical noticeboard
- The contents of a noticeboard
- All people within the University of Bath
- A kitchen
- A book
- A sentence from a book
- A book in the process of being written
- A GIF image
- An HTML document
- A postscript document residing on an FTP server
What is an identifier?
An identifier is an object that acts as a reference to something that has identity (i.e. a resource). The identifier may be used to dereference the resource if the resource is accessible. Note that at this level we have not specified that an identifier be unique.
Some examples of identifiers are listed below:
- A forename and a surname
- A postcode
- An ISBN number
- An ISBN number and a page reference
- A filename
- A description (e.g. colour, shape, weight)
Confusingly, we sometimes identify an entity by a name that is also another attribute of the entity. For example, we could label a particular person as “Jim with brown hair”.
In the context of URIs, the identifier is a set of characters conforming to the URI syntax. Restrictions to the syntax may further classify the URI as a URL, URN or URC, i.e. to different classes of identifier. Very broadly, a URN is a name, though this name should be globally unique, a URC is a resource description and a URL specifies the resource location.
What is uniformity?
The Uniformity of the URI is inherited by URLs, URNs and URCs. Uniformity refers to the strict syntax to which the URI must conform. A URL, URN or URC must each follow a more class specific syntax, designed to best facilitate the purpose of the class.
Uniformity provides a number of benefits:
- It allows the introduction of new types of resource identifiers without interfering with the way that existing identifiers are used.
- It allows a uniform interpretation of semantic conventions across different types of resource identifiers (for example, it is conventional for URL schemes to represent the method of network access).
- It allows different types of resource identifier to be used in the same context. For example, a URL and a URN can both refer to the same resource.
- It allows the use of identifiers to be reused in different contexts (thus permitting new applications or protocols to leverage a pre-existing set of resource identifiers).
The URL
The best known identifier is probably the URL. A URL identifies network accessible resources by a scheme (that conventionally represents the primary access mechanism), a machine name and a “path”. The path is interpreted in a manner depending on the scheme.
URLs have the most varied use of the URI syntax and often have a hierarchical namespace (e.g. in specifying a directory path in an HTTP scheme URL). Currently, we confuse URLs as both a name and a location for a resource. This is bad practice, since URLs may be transient and a the location defines exactly one location even though a resource may exist in multiple locations. In the larger Internet information architecture, URLs will act only as locators.
The URN
Whereas a URL identifies the location or container for an instance of a resource, a URN, in the Internet architecture, identifies the resource. The resource identified by a URN may reside in one or more locations, may move, or may not be available at a given time.
The URN has two practical interpretations, both for network-accessible resources. The first is as a globally unique and persistent identifier for a resource, achieved through institutional commitment. The second interpretation is as the specific “urn” scheme, which will embody the requirements for a standardised URN namespace. Such a scheme will resolve names that have a greater persistence than that currently associated with URLs. Work is still in progress on standardising this scheme.
A functional requirements standard for URNs (RFC1737) lays down a number of properties that URNs should embody. This includes features such as global scope, global uniqueness and persistence.
A number of URN resolving services currently exist, see the applications mentioned below.
The URC
The Internet draft “URC Scenarios and Requirements” defines the URC:
The purpose or function of a URC is to provide a vehicle or structure for the representation of URIs and their associated meta-information.
Initially URCs were envisioned to be the intermediate that associated a URN with a set of URLs that could then be used to obtain a resource. Later it was decided that metadata should also be included so that resources could be obtained conforming to a set of requirements. URCs are essentially descrptions of resources available via a network.
Although work has been carried out by the IETF URC working group, URCs are still not in existence. It seems unlikely at present that URCs will become standardised.
Use of URIs
The problems of using a locator (i.e. a URL) as a name have already been mentioned. URIs tackle addressing for the future Internet architecture. Resources will be identified by a URN, which will be resolved via a URN resolution service. Currently, it looks unlikely that URCs will have a large part to play in the process.
Official URI-related standardisation has been slow, though a number of URN resolution services now exist adhering to accepted conventions. For more details, see the TURNIP [9] pages.
A number of applications have been built around the URN concept, including:
- The Digital Object Identifier System (DOI) [10]
- The goals of the DOI system are to provide a framework for managing intellectual content, link customers with publishers, facilitate electronic commerce, and enable automated copyright management. The underlying technology is based on the Handle resolution system. The components of the system are an identifier, a directory (the basis for a resolution system) and a database (containing object information).
- The Handle System [12]
- The Handle System, developed by CNRI, is a distributed global system which stores names, or handles, of digital objects and which can resolve those names into locators to access the objects.
- PURLs [11]
- Persistent URLs, or PURLs, were developed by OCLC as an interim naming and resolution system for the Web. PURLs increase the probability of correct resolution and thereby reduce the burden and expense of catalogue maintenance. A PURL is functionally a URL. However, a PURL refers to a resolution service, which maps the PURL to a URL and returns this to a client. On the Web, this process is a standard HTTP redirect.
Further information
- RFC1630, “Universal Resource Identifiers in WWW”, <URL: http://ds.internic.net/rfc/rfc1630.txt">.
- RFC1738, “Functional Recommendations for Internet Resource Locators”, <URL: http://ds.internic.net/rfc/rfc1738.txt>.
- RFC1737, “Functional Requirements for Uniform Resource Names”, <URL: http://ds.internic.net/rfc/rfc1737.txt>.
- RFC1738, “Uniform Resource Locators (URL)”, <URL: http://ds.internic.net/rfc/rfc1737.txt>.
- RFC2141, “URN Syntax”, <URL: http://ds.internic.net/rfc/rfc2141.txt>.
- RFC2396, “Uniform Resource Identifiers (URI): Generic Syntax”, <URL: http://ds.internic.net/rfc/rfc2396.txt>.
- “URC Requirements and Scenarios”, <URI: http://www.acl.lanl.gov/URI/Scenarios/scenarios.txt>.
- W3C page on Addressing, <URL: http://www.w3.org/Addressing/>.
- TURNIP, the URN Interoperability Project, <URL: http://www.dstc.edu.au/RDU/TURNIP/>.
- The DOI Foundation, <URL: http://www.doi.org/>.
- PURL Homepage, <URL: http://purl.oclc.org/>.
- The Handle System, <URL: http://www.handle.net/>.
Author Details
Ian PeacockEmail: i.peacock@ukoln.ac.uk
UKOLN: http://www.ukoln.ac.uk/
Tel: 01225 323570
Address: UKOLN, University of Bath, Bath, BA2 7AY