2016-11-12

The rule of Database

The First Rule of Database

Every value should be atomic

  • i.e., not contain parts that we might want to work with separately.
  • For example; store personal and family names in separate columns
    • Instead of putting the entire name in one column
    • Don't have to use substring operations to get the name's components.
    • Splitting on spaces is unreliable:
    • Think of a name like "Eloise St. Cyr" or "Jan Mikkel Steubart"

Second Rule of Database

Rvery record should have a unique primary key.

  • This can be a serial number that has no intrinsic meaning,
    • Like the id field in the Person table,
  • A combination of values:
    • The triple (taken, person, quant) from the Survey table uniquely identifies every measurement.

Third Rule of Database

There should be no redundant information.

  • For example, we could get rid of the Site table and rewrite the Visited table like this:
|id   |lat   |long   |dated      |
|-----|------|-------|-----------|
|619  |-49.85|-128.57| 1927-02-08|
|622  |-49.85|-128.57| 1927-02-10|
|734  |-47.15|-126.72| 1930-01-07|
|735  |-47.15|-126.72| 1930-01-12|
|751  |-47.15|-126.72| 1930-02-26|
|752  |-47.15|-126.72| -null-    |
|837  |-48.87|-123.40| 1932-01-14|
|844  |-49.85|-128.57| 1932-03-22|

Problem with recording in One Single table

We could use a single table that recorded all the information about each reading in each row (just as a spreadsheet would). The problem:

  • Very hard to keep data organized this way consistent:
    • If we realize that the date of a particular visit to a particular site is wrong,
    • We have to change multiple records in the database.
    • What's worse, we may have to guess which records to change, since other sites may also have been visited on that date.

The Fourth Rule of Database

The units for every value should be stored explicitly (Our database doesn't do this and that's a problem):

  • Roerich's salinity measurements are several orders of magnitude larger than anyone else's,
    • But we don't know if that means she was using parts per million instead of parts per thousand,
    • Or whether there actually was a saline anomaly at that site in 1932.

Symbiotic Relationship

Data and the tools used to store it have a symbiotic relationship:

  • We use tables and joins because it's efficient, provided our data is organized a certain way,
  • But organize our data that way because we have tools to manipulate it efficiently.
  • As anthropologists say, the tool shapes the hand that shapes the tool.