22 Oct 1997 ............... Length about 2800 words (17000 bytes).
This is a WWW version of a document. You may copy it. How to refer to it.

The problem of Departmental Web pages

by
Stephen W. Draper
Department of Psychology
University of Glasgow
Glasgow G12 8QQ U.K.

Contents (click to jump to a section)

1. Introduction

Over the period May-Sept. 1997 I have been supervising 4 M.Sc. IT projects concerning web pages for university departments. The projects are by Glen Murray, Stuart Fynes, Paddy McGuire, and Graeme McLay. This memo presents some ideas that emerged from that work.

The main conclusion in my view is that the real problem is of a different nature than first struck us, and I think others. The unsatisfactory nature of this issue for many departments comes mainly from not recognising its true nature, and what the most important attributes of the problem are. I call these "general requirements" of the problem, and present them below. The projects also gathered an extensive list of information requirements i.e. a list of things various users would like to access from the web pages, and this list is a useful resource for future work. But first the crucial issue was understanding the nature of the problem and developing a general solution.

2. Information brokers, users, and providers

Web pages are searched and read by people I will call information "users", while the information is originally provided by others I will call information "providers". The first point to grasp about departmental web pages is that, unlike many simpler HCI situations, neither users nor providers typically have much motivation. Instead much of the motivation (perceived utility) resides with a third party I will call information "brokers": typically the heads of department. It is the brokers who wish the information exchange to be effective, who want the department to look good, to have the best students apply to the department (attracted in part by web displays), graduating students to get jobs by advertising themselves on the web, the department to look good to funding bodies and research referees, and so on. In contrast most users will only spend a few seconds looking: if they don't find what they want they will use another source or give up. Similarly the members of staff who have to provide the information see little immediate benefit in doing so. It doesn't seem to save any other work, earn money, or make them famous overnight.

This has two consequences.

Firstly, the brokers are the only ones who are likely to devote real resources to web page creation, but they will be frustrated if they just ask the staff to do it: their staff are not sufficiently motivated to provide the information and do more authoring. Any workable solution must avoid appearing to increase work for most staff, yet must organise their providing the information.

Secondly, the "task" being done by web pages is (usually) that of an information flow from provider to user, but where they usually do not meet or converse. This is in fact essentially the same as with printed documents like the prospectus. Because of that, there are very frequently large failures of communication: we have come across a number. In other words, these documents should have been tested for comprehensibility on their intended readers, but usually are not with predictable consequences.

3. The "database" approach to web pages: A different analysis of the problem and how to solve it

We studied two departments at once: Computing Science and Psychology. Interestingly they both had very similar problems over their web pages, even though, as you would expect, the average level of computer skill and expertise was very different. Both departments had in the recent past paid money to have an individual devote themselves to producing and systematising the department's web pages. In both cases this didn't lead to satisfaction for long: pages were soon out of date and not updated, many staff members in both departments either failed to create their own web pages, or after an initial burst of activity left them to become seriously out of date.

In effect both departments had perceived the problem as one of programming -- designing pages and writing the "code" -- to be solved by hiring someone with skill in the relevant programming "language" (HTML), but not someone expensive because HTML is a "simple" language. This has not produced satisfactory results. I argue here that that is because that apparently obvious perception of the problem is in fact wrong, and a much deeper analysis leading to a rather different approach is necessary. (I.e. systems analysis and design, not a simple job by an inexperienced programmer, is required.)

The main requirements

What are the main requirements, particularly those implied by the unsatisfactory results of past attempts?

The first one is that web pages should be updated frequently. We may be used to print documents becoming out of date because they are only produced once a year, but a wished for benefit of web pages is that they should be up to date. Any solution must be one that keeps the pages up to date.
This was the most obvious problem with previous approaches. It will not be achieved by hiring a temporary author to create web pages. Even if a web page manager is made into a permanent post, there will be the problem of eliciting changes to information from the providers at frequent intervals. Because the information is and must be provided by many different members of staff, the main problem is not designing computer code at all, but designing (and getting accepted) human procedures for getting the information.

The second requirement is that it must not require extra work, at least from most staff. Part of this is real resource limitations: we do not have lots of spare staff hours to assign to this. Part of it is perception: most staff are not much motivated to support fancy web pages, so as far as possible the human procedures being designed should appear to be no extra work. Ideally they should reduce current work, certainly they should appear to add very little.
This is true in general, but it will apply most specifically to checking information: currently most information supplied by staff is done annually e.g. for course handbooks. Since frequent updating implies frequent checking, this is actually a new demand on staff, and needs to be made very light weight in their perception. The intuition is, however, in assigning admin. tasks to staff that it is remembering to do them and getting round to doing them that is as much mental effort as actually performing the task: in other words, the reminder function may be as important as the actual task of checking an correcting.

With the "no extra work" requirement in mind, we analysed each information requirement with respect to who would have to supply it and whether they already supplied it. This led to the recognition that fortunately much of the information that would ideally be supplied was in fact already being supplied for printed documents. In Psychology, even extra information that would ideally be available on web pages about where the fax machines were etc. was to be included in a handbook being planned by the departmental adminstrator. This leads to a third requirement: to organise any solution around documents that are currently produced in print. Since staff already accept the need to produce these documents, much information for web pages is available without extra work.

However a corollary of this is a fourth requirement that information, and probably their organisation into documents, needs to be produced in two forms: print and web. We require a solution that does both in order to minimise work.

The organisation of the information into documents (e.g. a course handbook) draws attention to how information provision is currently organised. Each document has an editor, rather than an author, whose role is to elicit pieces of information from many other people, and then assemble these into the final document. Chasing the contributors is a significant amount of work, and publications are often delayed. Then a lot of time-consuming routine editing is required, pasting in the contributions. Again, this seems undesirable if it could be avoided. It would be desirable to support these editors in their tasks of reminding and assembling.

Requirements summary

The "database" solution: technical outline

With those requirements in mind, the solution I propose is to divide the information into many small chunks, store these chunks in a "database" where they may be individually updated as necessary, and generate from the database two types of document: print documents, and web pages.

The present implementation, created in demonstration form by the IT projects, holds the data in Unix files. "Skeleton documents" contain special tags referring to data items, and Perl scripts running under Unix process these by replacing the tags with the current values of the corresponding data items into the documents. Each document is held in two forms: an HTML skeleton document to generate web pages, and a Word document in RTF format to produce a printable Word document. Initially these may be very close in form e.g. creating a web document by processing an original Word document (using the conversion program "rtf2html"), but they could be completely unrelated, and probably will diverge in time as the format requirements of each are different.

A forms interface on the web allows users to update data items whenever they wish: this is how providers can check and update the items for which they are responsible. (The scripts will then run, and the updated web pages will appear soon after, and the print document will be updated too.) A reminder system is also provided. This can be set to check every data item, and if any have not been checked for longer than their assigned update interval, then that item's provider will be reminded by email. Every data item stores the time it was last checked; the update interval can be separately set for each type of item; updating an item (with or without modification) resets its date. A document editor can also send out reminders for all the items in one document by issuing a single command.

I refer to the data storage mechanism as a "database" in quotes, because currently it does not use a proper database management system but only ad hoc files. This approach fully adopts the database idea of dividing what is to be stored into small items that can be separately retrieved, and each of which is only stored in one place even though typically used in many places. However other features of a database system are not critical in this application: storage economy, large volumes of data, efficient indexing, complex relationships requiring combining data from several tables, the standard query language SQL. Furthermore what is required is not currently provided by database software: embedding retrieval queries (cf. our tags) in several different standard document types (HTML and Word).

As well as the "database" that holds the current values of each item, there is a "metadatabase" that holds information about the permanent attributes of each item, including: where it is stored (i.e. how it can be retrieved), who its provider is (i.e. who has to check and update it), its update interval (e.g. every month, once a year). Thus more documents can be added to this system by adding their component items to the "metadatabase", checkers and intervals can be changed and so on without having to edit the scripts.

The solution from the users' viewpoint

Once set up, this is how it would appear from the viewpoint of department staff. Providers could update data at any time, and updates would appear nearly immediately in web documents, and be ready in print documents. Editors (responsible for particular documents such as phone lists or course handbooks) can set the reminder system to prompt providers to verify information every so often (e.g. once a month), or to update now (more appropriate for handbooks). In principle the documents will be reassembled automatically as new data arrives. In practice, the editor will probably go over the document (perhaps 10 mins. work?) adjusting layout as necessary.

Document structure itself for either web or print versions can be changed at any time by using a normal editor on the "skeleton document", using tags to stand for data items. No new skills are required apart from using the tag convention. It won't be properly WYSIWYG however, as they will only see the tags in the editor: though it will only only take a short delay to regenerate the whole document with the data substituted in. Completely new documents using old data can be easily created by creating extra skeleton documents. For example various versions of staff default web pages could be created quickly using the same data as is now maintained for the staff phone list.

The proposed approach is thus designed to support web documents and to keep them up to date with little perceived extra work for suppliers and editors, and perhaps less than is currently required for the printed documents alone. There will however be some setup costs for converting existing documents to this system, and of course new information not presently supplied will require new work.

4. Current state and prospects for adoption

Current state

The tag replacement software, reminder system, and data update web interface are all currently working on two demonstration documents: a staff phone list, and a small test course handbook document. The software, a collection of Perl scripts on a Unix machine, will realistically require some future maintainence as they are too new to be robust yet. I expect this all to be adopted in Computing Science as the Information Officer, Jon Ritchie, is enthusiastic: he will no doubt maintain a version of the scripts.

The update forms require a web server with CGI scripts enabled.

Although currently the data is held in Unix files, it would be simple to adapt the system to take data from a true database management system on the same machine as the web server and Perl scripts.

Setup and conversion costs

To use this approach requires documents to converted (or created) to fit it. For instance, the staff phone list might be currently held as a spreadsheet. This can be saved in ascii form (tabs separating columns, linefeeds separating rows), and this used as the "database" file. Entries must then be made in the "metadatabase" to describe the data. Two skeleton documents must then be created with tags in place of the data items. However more than one web page can easily be created: for instance both a phone list, and a set of basic staff web pages.

Another example might be a course handbook. Here many decisions must be taken about how to divide it up into data items. For each data item, an entry must be made in the "metadatabase", the item's content stored in a "database" file, and that content replaced by a tag to convert the original document into a skeleton document. The web skeleton document can probably be created by a direct conversion of the Word skeleton document into HTML. This process is time consuming: perhaps a day's work for someone who has already got used to this approach, and learned the meaning and format of the "metadatabase". However once done, there will be little work required to create a new edition. The new items will be collected using the reminder system with little work by the editor. A Word document with all the new items will be automatically assembled. It can then be manually edited if the formatting e.g. page breaks need adjusting before printing.

5. What still remains to be done

Introducing the "database" approach

All the above will still only create and maintain a web version of current paper documents, although with the update problem thoroughly addressed. This will probably be the majority of what in fact is required on departmental web pages, and could directly encompass many things e.g. the list of departmental seminars (currently printed on ad hoc sheets, or just emailed).

Systematic design, redesign, and testing

What remains is a more systematic approach to covering and effectively delivering all the information required. If ever done (I know of no dept. that has addressed this properly yet), this would mean going over the full list of information requirements, creating new documents (at least web pages) so that all are covered, and testing these documents on users to ensure the information is successfully communicated.

6. Someone else's experience (Bev Taylor)

Here is someone else's experience (personal communication).

From: Beverley Taylor (mstbmtx@panther.gsu.edu) 22 Oct 1997
Hi Steve. I just read about your departmental web page and the database approach and thought I'd share ours--not that it is spectacular or anything--just the approach. Our web page was developed a few quarters ago by myself (as a graduate research assistant/PhD student in IT) and two MS students from CIS who had a class assignment. We used the university's server for access to the main page and any pages that don't require the database. But, in terms of updating and ongoing maintenance of departmental materials, we set up one of the better machines on the floor (not an NT though) as a website server. The students downloaded a trial version of a program called "Cold Fusion" that is similar to HTML code and interfaces with Microsoft Access database (though you could use any SQL database program I think). I'm not a techie..so I don't know a lot beyond that.
[ http://www.gsu.edu/~wwwmst Click on People or Program Info to see the database part...]

Now that the system is set up and those GRA's graduated, we hired a different GRA talent. She just started, but she's an IT student with a publications background. During this phase our goal is to add unique content and expand existing database content (and fix the English--both CIS students were from China!).

As for myself--I am graduating this quarter. My dissertation focused on sharing motivational strategies with learners who are are more extrinsically motivated than intrinsically motivated for going to college. I developed a web-based program but needed a programmer's help for cgi/PERL stuff. That's why I think this Cold Fusion aproach may be better for less-techie types. If you want to see the program (click on 'About MTV for directions) is at http://www.gsu.edu/~mstbmtx/mtv

Bye! Bev
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Beverley Taylor, Instructional Technology
bev_taylor@mindspring.com
http://www.gsu.edu/~mstbmtx
Knowledge Shared Multiplies
* * * * * * * * * * * * * * * * * * * * * * * * * * *