Last changed 18 Sep 2010 ............... Length about 600 words (5,000 bytes).
(Document started on 17 Sep 2010.) This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/rap/fcal/tableformatdoc.html. You may copy it. How to refer to it.

Web site logical path: [www.psy.gla.ac.uk] [~steve] [rap] [fcal] [this page]

Notes on the rationale behind the table format tool

By Steve Draper, Department of Psychology, University of Glasgow.

Idea is to have a tool, but not to replicate functionality (e.g. editing facilities; table borders control; graphing) that are done elsewhere. So make it easy to transfer data in and out.

The data is mostly treated as nominal (category) data: the sort operations treat values as ordinal; the graphs alone treat numbers as ratio scale data. For the rest they are just names.

This is partly because changing the dimensions displayed mainly makes sense only for displaying very few dimensions, each with rather few discrete values.

Relational databases are based on relational algebra. They can be thought of as representing data as a set of triples (say) where no two triples can be the same (or anyway, duplicates have no meaning and are discarded); and where having the same two values in the first two places doesn't stop you having different values in the 3rd place. A relation could have all possible combination of values; but typically only a small fraction of the possibilities are present.

A sparse (e.g. 2dim) matrix, where many values are missing / blank, can economically be represented by a relation.

A function always gives a definite value, given any value of its arguments. Thus a function of 2 variables always returns the same (3rd) value (although possibly zero or NULL) for any particular pair of values. A function can be represented as a set of triples, but there would have to be a triple for every possible combination of values of the 2 parameters.

A table is a function from its 2 dims to a cell/value. So:

So there may be multiple cells with same value (cell contents). I.e. many cases of different (i,j) pairs yielding one cell value.
They may not have all the values on a dimension that you would like.
A lot of no-value cases (blank cells).
A lot could end up being collapsed on i=0 and j=0 after changing dims.

A relation is represented as a table where the columns are the places in each tuple e.g. 3 cols for triples, and the rows are the elements in no particular order.

Statistics data is usually represented (that is, stats packages require this format) with one row per human participant; and the columns are all the different properties and measurements recorded for that one person. It is a relation: no duplicate rows, no missing values in a row. (Of course missing valid measurements are common: but have to be filled in somehow.) But such data is a subtype of relations in general: (that correspond to an "entity"). Where one column acts as a "key" e.g. the participant's name or subject number. Then all the columns are in fact functions of one variable (the key) and return the value of that attribute for that participant.

Web site logical path: [www.psy.gla.ac.uk] [~steve] [rap] [fcal] [this page]
[Top of this page]