Securing HTML FORMs
There are now many HTML FORMs in use in libraries of all types. These forms are usually the front ends to Common Gateway Interface (CGI) programs that implement such things as webopacs, electronic ILL requests and access to backend databases. Often these FORMs have to be authenticated to ensure that only valid library users can make use of the services. Web browsers and servers offer a number of a facilities for doing this authentication, each with different benefits and problems. This article provides a brief overview of the choices available.
The Password TYPE in <INPUT>
The <INPUT> element is used in HTML FORMs to allow the user to enter data and make selections from check boxes and radio buttons. One of the available TYPE attributes of the <INPUT> element understood by most FORMs capable web browsers is password. This type provides an text input box for the user but the browser does not actually render the characters that are entered. Instead, they are replaced by asterisks, or in some cases nothing at all. When the FORM is submitted to the server, the contents of this element are transmitted just as though they had come from a normal text input box.
This mechanism prevents other users from seeing a user's password on the screen as it is being entered. It has the advantages that it is easy for the user to use, easy for the FORM/CGI script designer to use and is widely implemented in web browsers.
However, that preventing the password from being seen on the screen is really the only security that it provides. The password itself is sent over the network in the clear which means that anybody monitoring the network traffic will be able to see the password. It is therefore only really suitable for situations where the FORM is only being used over a local area, physically secured network. Also other users can often use the browser's back function to go back to the FORM, change other fields and then re-enter the FORM with same, valid password. This is especially important to note in public access situations where a single machine is used by many users.
Timing out the Password TYPE
To overcome the last of the difficulties with the password TYPE, one can take advantage of the fact that the FORM is providing a front end to a CGI script to add some extra cryptography. One way of doing this is generate a cryptographic hash value based on the time the original FORM was generated concatenated to a secret value known only to the server that generated the form. This cryptographic hash can be generated by one of the Message Digest series of cryptographic functions such MD4 (see RFC 1320). It is then concatenated with the plain text version of the time and placed in an <INPUT>; element with a hidden TYPE in the FORM, along with the <INPUT> element with the password TYPE and any other FORM elements. The hidden <INPUT> element forms a time limited session ID for the password entry.
When this FORM is returned by the browser, the first thing that the CGI script has to do is to split the session ID into it two constituent plain text time and hashed parts. It then recomputes the hash value using the plain text version of the time and checks to see if it matches the hashed value it received in the session ID. If the two hash values don't match then someone has tried to tamper with the password entry form and it is rejected. If the two hash values are the same, the script then examines the plain text form of the time and checks to see whether it is still valid. It does this by checking whether the plain text time value is older than the current time minus some predefined timeout value. If it is older, then the password entry FORM has timed out and a new FORM should be generated asking the user to reenter his password.
The technique presented so far prevents people from using the back function on public access browsers to use someone else's password after a certain timeout period. Even though they can go back to the FORM and resubmit it, after the timeout period it will be rejected by the script. This is much the same as existing text based OPACs that assume that if the user has not done anything for a couple of minutes then that session has ended and a new session should be started.
Sometimes we may wish to have passworded access to a whole series of HTML FORMs. Rather than having to ask the user for his password on every FORM, we can use cryptographic hashes to embed a hidden, time limited, encrypted form of the password in subsequent FORMs. The way to do this is to generate a Propagated ID in the CGI script after the user's password has been authenticated from the first FORM. This propagated ID contains a plain text time, a plain text user name and then a cryptographic hash. This time the cryptographic hash is formed from running the hashing function over the concatenation of the time, the server's secret value, the user name and the user's password.
In this case the subsequent scripts first split the propagated ID up into its three constituent components. It then looks up the user's password and recomputes the hash value. This new hash is compared to the one that was in the propagated ID; if the two do not match then the FORM has been tampered with. If they do match, the time value is checked to ensure that it is within the timeout period and if it is, the real operations of the CGI script can proceed.
This technique has the advantages that the user only needs to enter his password once (unless he waits so long that the timeout occurs), it provides timeouts to a series of FORMs, it is widely supported by web browsers and no modifications are required to the HTTP server's configuration. The disadvantages are that more codes is required in the CGI scripts and the password still appears in plain text once on the network.
HTTP Authentication
The authentication techniques presented so far all rely upon the CGI program performing some form of authentication operation. However sometimes you may wish to make use of a CGI script that does not have any built-in authentication operations and also may not provide the facility for them to be added locally (it may be distributed only in binary form for example). In this case, it may be worthwhile making use of the built in authentication in the HTTP protocol.
The HTTP authentication mechanism has to be turned on by appropriately configuring you HTTP server. Exactly how this is done varies from server to server but it usually involves editing some of the configuration files in the server's home directory and then restarting the HTTP server process. The authentication can usually be applied on a directory by directory basis and sometimes even a file by file basis and there are a number of different authentication mechanisms defined. The most widely implemented mechanism is called Basic authentication and it offers much the same security as the simple password TYPE >INPUT< element. However more secure mechanisms such as Digest and one time cryptographic passwords are being standardised and are likely to be widely implemented in the near future.
The HTTP authentication causes the HTTP server to check for authentication information in the HTTP request. If none is present, it returns an HTTP error code to the browser. This in turn causes the browser to request authentication information from the user, which it then uses to resubmit the request. Once the authentication has been successfully achieved, most browsers will continue to send it to the same server host for subsequent HTTP requests. This means that the it is again possible for other users to make use of the browser's back function to reuse a user's valid authentication after he has left a shared public access machine. Some of the newer authentication mechanisms being proposed for HTTP may help overcome this but unfortunately the widely implemented and used Basic mechanism is susceptible to this. HTTP based authentication also has the disadvantage of having to give CGI script authors/implementors access to the HTTP server's configuration files, which some HTTP server administrators may not wish to do.
Secure Sockets Layer
One solution to the problem of sending passwords and other valuable data such as credit card numbers in plain text over the network is to encrypt the whole of an HTTP transaction. The main standard in this area is Netscape's Secure Sockets Layer (SSL) which is implemented in their servers and clients. A number of other commercial servers and browsers also implement SSL and there is an SSL module available for the popular Apache HTTP server as well.
SSL works by establishing an encrypted session on a separate port of the HTTP server (usually port 443 rather than the normal port 80). The client then performs what is known as a handshake during which cryptographic information is exchanged between the client and the server in order for them to authenticate each other and exchange cryptographic keys. SSL supports multiple encryption and cryptographic hashing algorithms, thereby allowing an upgrade path to newer, stronger cryptographic functions and the ability to "blackball" algorithms that have been compromised.
SSL offers a much higher level of security than the previous examples by encrypting the entire session, even if it is composed of multiple HTTP transactions. Unfortunately the authentication performed is based on a the client rather than the user and so it is not necessarily appropriate for use with public access browsers unless it is used in combination with one of the techniques above (preferably the time limited passwords on FORMs) or by using some external hardware to provide the SSL authentication certificates. SSL also suffers from not being as widely implemented as the previous technologies and has to some extent been partially crippled by the United States restrictions on the export of cryptographic technology.
Conclusions
As can be seen from this brief overview, the designer or implementor of CGI program(s) and HTML FORMs has a variety of authentication mechanisms available to them. The appropriate mechanism will vary depending upon the environment and task in hand, the facilities available and the amount of effort that they wish to spend. For example, one might use just the <INPUT> element's password TYPE to secure access to an ILL database on a machine that is only available to private library staff machines over a secure physical network. At the other end of the spectrum one might use SSL to encrypt time limited password access to a FORM that requires credit card information to be presented from a public access workstation. The choice, as they say, is yours.