2016-11-12

## The First Rule of Database

Every value should be atomic

• i.e., not contain parts that we might want to work with separately.
• For example; store personal and family names in separate columns
• Instead of putting the entire name in one column
• Don't have to use substring operations to get the name's components.
• Splitting on spaces is unreliable:
• Think of a name like "Eloise St. Cyr" or "Jan Mikkel Steubart"

## Second Rule of Database

Rvery record should have a unique primary key.

• This can be a serial number that has no intrinsic meaning,
• Like the id field in the Person table,
• A combination of values:
• The triple (taken, person, quant) from the Survey table uniquely identifies every measurement.

## Third Rule of Database

There should be no redundant information.

• For example, we could get rid of the Site table and rewrite the Visited table like this:
|id   |lat   |long   |dated      |
|-----|------|-------|-----------|
|619  |-49.85|-128.57| 1927-02-08|
|622  |-49.85|-128.57| 1927-02-10|
|734  |-47.15|-126.72| 1930-01-07|
|735  |-47.15|-126.72| 1930-01-12|
|751  |-47.15|-126.72| 1930-02-26|
|752  |-47.15|-126.72| -null-    |
|837  |-48.87|-123.40| 1932-01-14|
|844  |-49.85|-128.57| 1932-03-22|

## Problem with recording in One Single table

We could use a single table that recorded all the information about each reading in each row (just as a spreadsheet would). The problem:

• Very hard to keep data organized this way consistent:
• If we realize that the date of a particular visit to a particular site is wrong,
• We have to change multiple records in the database.
• What's worse, we may have to guess which records to change, since other sites may also have been visited on that date.

## The Fourth Rule of Database

The units for every value should be stored explicitly (Our database doesn't do this and that's a problem):

• Roerich's salinity measurements are several orders of magnitude larger than anyone else's,
• But we don't know if that means she was using parts per million instead of parts per thousand,
• Or whether there actually was a saline anomaly at that site in 1932.

## Symbiotic Relationship

Data and the tools used to store it have a symbiotic relationship:

• We use tables and joins because it's efficient, provided our data is organized a certain way,
• But organize our data that way because we have tools to manipulate it efficiently.
• As anthropologists say, the tool shapes the hand that shapes the tool.