Web site logical path: [www.psy.gla.ac.uk] [~steve] [rap] [fcal] [this page]
Idea is to have a tool, but not to replicate functionality (e.g. editing facilities; table borders control; graphing) that are done elsewhere. So make it easy to transfer data in and out.
The data is mostly treated as nominal (category) data: the sort operations treat values as ordinal; the graphs alone treat numbers as ratio scale data. For the rest they are just names.
This is partly because changing the dimensions displayed mainly makes sense only for displaying very few dimensions, each with rather few discrete values.
Relational databases are based on relational algebra. They can be thought of as representing data as a set of triples (say) where no two triples can be the same (or anyway, duplicates have no meaning and are discarded); and where having the same two values in the first two places doesn't stop you having different values in the 3rd place. A relation could have all possible combination of values; but typically only a small fraction of the possibilities are present.
A sparse (e.g. 2dim) matrix, where many values are missing / blank, can economically be represented by a relation.
A function always gives a definite value, given any value of its arguments. Thus a function of 2 variables always returns the same (3rd) value (although possibly zero or NULL) for any particular pair of values. A function can be represented as a set of triples, but there would have to be a triple for every possible combination of values of the 2 parameters.
A table is a function from its 2 dims to a cell/value. So:
A relation is represented as a table where the columns are the places in each tuple e.g. 3 cols for triples, and the rows are the elements in no particular order.
Statistics data is usually represented (that is, stats packages require this format) with one row per human participant; and the columns are all the different properties and measurements recorded for that one person. It is a relation: no duplicate rows, no missing values in a row. (Of course missing valid measurements are common: but have to be filled in somehow.) But such data is a subtype of relations in general: (that correspond to an "entity"). Where one column acts as a "key" e.g. the participant's name or subject number. Then all the columns are in fact functions of one variable (the key) and return the value of that attribute for that participant.
Web site logical path:
[www.psy.gla.ac.uk]
[~steve]
[rap]
[fcal]
[this page]
[Top of this page]