Databases: The Relational Model and Relational Algebra
Basic Terms
Relation Schemas
Relation schemas are finite sets of attributes.
Notation for some relation with attributes and primary key :
The domain of a schema is the set of all possible tuples that can be associated with the schema.
For relation , this is denoted:
Keys
A superkey of a relation schema is any subset of that schema that can uniquely identify a tuple.
In other words, all tuples have unique values in the superkey.
Or more formally, for and any two distinct tuples , if for some , then is a superkey.
Note that for , is always a superkey.
A candidate key is a “minimal superkey”.
More formally, superkey is a candidate key if no subset exists that is also a superkey.
A primary key is the specific candidate key designated to a relation schema.
All relations must have exactly one primary key.
A foreign key of a relation schema is a subset of attributes that reference a particular primary key.
The referenced primary key is often a different relation, but may also be of the same relation.
Attributes
Attributes are labels for parts of a schema.
The domain of an attribute is the set of values that can be associated with the attribute.
For attribute , this is denoted:
Relation Instances
Relation instances are finite sets of tuples.
To show that is an instance of schema , we write:
Tuples
Tuples are mappings from schema to schema domain.
Concretely, a tuple can be thought of as a set of attribute-value pairs.
We will use the following notation in this cheatsheet:
extracts attribute set .
is itself a tuple.extracts a single attribute .
is a value. (Yes, I know this is confusing.)combines tuples and .
is itself a tuple.
The Null Value (Extended RA)
The null value is a special value that may be part of an attribute domain.
We will use the symbol to represent null in this cheatsheet.
While the null value is an important part of the relational model, it is not a core part of relational algebra.
Operators
Selection
Produces the subset of tuples that satisfy a condition.
For and selection condition :
Projection
Produces a relation with a subset of attributes.
For and attribute set :
Rename
Casts a relation to a different schema.
For and a compatible schema :
Union, Intersection, and Difference
Set theoretic operations between union-compatible relations (i.e. same schema).
For and :
Cartesian Product
Produces every possible combination of tuples from two relations.
For and :
Theta/Inner Join and Equijoin
Theta join combines the tuples of two relations using a matching criterion.
For and , and some arbitrary matching criterion :
Theta join can also be defined as:
Duplicate attributes are not removed.
Equijoin is a special case of theta join that only tests for equality between attributes.
Natural Join
A special case of equijoin that matches tuples on all their common attributes.
For and :
where is “all common attributes must match”:
Natural join can also be defined as:
where is:
and assumes removal of duplicate attributes.
Natural join is dangerous in real applications. Use equijoin to explicitly match tuples instead.
Division (Extended RA)
For and , with :
where:
In plain English: (1) For all tuples in , (2) there is at least one relation in (3) whose common attributes match, (4) and whose non-common attributes are the same.
Division can also be defined as:
where is “all possible result tuples”:
In , values and are the only values that occur with all values , , and .
Outer Join (Extended RA)
Similar to theta join, except in the result relation, we also include tuples that don’t have matches in the join.
These unmatched tuples get padded with null values in the result relation.
Full outer join includes all tuples of both operands:
Left outer join includes all tuples of the left:
Right outer join includes all tuples of the right:
To summarize the differences between the joins:
Inner/Theta
Left Outer
Right Outer
Full Outer
Grouping Operator (Extended RA)
Performs calculations over groups of tuples within a relation. Produces a relation containing the results.
For and operator subscript :
where is a list containing:
- one or more attributes from to be taken as grouping attributes, and
- one or more aggregate functions, written in the form , where is an aggregate function to be applied to attributes .
Formally, aggregate functions can be considered to take a multiset (i.e. a set with duplicates) of values.
Aggregate functions defined in SQL-92:
AVG, MAX, MIN, SUM, COUNT
Most major DBMS implementations offer many more aggregate functions and allow user-defined functions.
The function is interesting in that it doesn’t really matter which column you use.
Generalized Projection (Extended RA)
We can extend the projection operator to also contain expressions for computation.
Further Concepts
Functional Dependencies
TODO: Write about this!