Visualisation by query or by metadata

Stephen W. Draper
GIST (Glasgow Interactive Systems cenTre)
University of Glasgow
Glasgow G12 8QQ U.K.


Draft of a note arising out of Fadiva2 (20-22 July 1995, Glasgow).


One of the most interesting points for me from the Fadiva2 meeting, but was only very briefly discussed, arose from a question to Antonio Massari. This emphasised that in Antonio's software, the (virtual reality) visualisation was selected on the basis of each specific user query and the data set returned from the database in reply. This is quite different from the approach of many other researchers, who often design visualisations based on the structure of the data in some database, independent of any user query: i.e. it is based on the metadata. These are 2 main points on a spectrum of possibilities for designing or automatically generating visualisations, which in principle might have at least these alternatives:
1. General visualisation methods for any data at all.
2. Based on the metadata for one application i.e. one domain.
3. Based on a particular database i.e. both the metadata and knowledge of a particular dataset e.g. could take account of how large the collection was, the fact that in that collection one set (e.g. teachers) was an order of magnitude smaller than another set (e.g. students), etc.
4. Based on particular standard user queries fixed in advance.
5. Based on the particular set returned by one specific query to one specific database.

Antonio's approach corresponds to (5), and could be described as much more user centered than most other approaches. Traditional database work used tables to visualise all data whatsoever, which is "general" but ignores everything about the data and everything about the user. Our own Iconographer work allowed the user to choose the visualisation, but otherwise is in the old "general" approach. Most work on automatic generation of visualisations beginning with Mackinlay (and also present in Iconographer) selects visualisations based on properties of data types. This means it takes notice of the metadata, but not of the collection (e.g. how much data there is in the database or in each relation), and not of the particular query. Which approach is right?

We know that an enormous amount of programming time in applied database work is in fact spent in customising general database software to particular applications, and that much of this is spent in designing "forms": both the standard queries that will be used in that application AND the screen layout that will be used in presenting the results of that query. This is strong evidence that all the generality of the underlying DBMS does not help real users, and all has to be undone and made specific by other programmers in order to produce the software that is actually used in most applications. This might be evidence in favour of a query-centered approach to visualisations, which would then not have to be hand-written as it often is now. On the other hand, as someone said in discussion, query-specific visualisations may confuse the user because if they modify the query slightly the visualisation may totally change. However a slight change in a query may cause an enormous change to the size of the set returned, and few visualisations (whether a text list, a set of icons, or some virtual reality display) can really be the same from the user's point of view when they change from showing 6 items to 600 items. In most cases, suddenly the display changes in some way to do scrolling with most items off screen most of the time. Thus in reality, all approaches probably change the display in ways important to its usability when the query changes.

Another striking feature of Antonio's approach is that, as he revealed in answer to a question, his software may introduce an extra layer of structure into the display of the reply if the returned set is too large for the visualisation (e.g. too big to display as doors along a single corridor). This is comparable to what scrolling does: just as a printed book divides the text into pages and then into lines that are usable subdivisions of the material but do not correspond to meaningful structural subdivisions, so scrolling divides screen displays. We are thus used in both books and computer screens to having two independent hierarchies of structure: the hierarchy of the material (e.g. chapters and sections and paragraphs) and the hierarchy of the display medium (e.g. lines and pages). Antonio's approach merges these two, using a single decision or design process for both. This sounds obviously better in principle: or is it really better to leave the user with the independent local controls that scrolling give?