This will be a post covering the common ground between database systems, set theory and relational algebra. It is intended as a basic refresher or an intro depending on the reader. I do assume some basic set theory notation knowledge on the part of the reader. Questions and clarifications are welcome.
Set : These are collections that are completely characterized by their elements. Two sets are equal only if they contain the same elements.
Relation (AKA table or entity set) : Given sets A relation is a subset of . It is set of n-tuples where
Each relation has a fixed number of columns that are explicitly names where each attribute name within a relation is unique. Rows/tuples in a relation have no ordering associated with them. A database has multiple relations.
Tuple (AKA row or entity instance): It is an element in a relation.
Attribute (AKA column): It describes a tuple in a relation.
Query Language : It is a language in which a user requests information from a database.
We will cover the 6 basic operators used in relational algebra. These operators can later be used in compositions to form higher order operations.
- Select ( )
- Project ( )
- Union ( )
- Set Difference ( )
- Cartesian Product ( )
- Rename ( )
These operations take one or two relations as inputs and produce a new relation as a result. This is a property known as closure.
Select ( )
in this example p is called the selection predicate. R is the relation upon which we perform the select operation. This operation will always return a subset of the initial relation. (remember that a set is also a subset of itself, ( ))
Formally, select is defined as
SELECT * EMPLOYEES WHERE firstName='John';
Will return all employees who's first name is John
Project ( )
are attribute names and R is the relation.
The result is defined as the relation of K columns obtained by erasing the columns that are not listed. Duplicate rows are removed since relations are sets.
SELECT firstName, lastName FROM EMPLOYEES;
Will return all employees, but only with their firstname and lastname
This requires two relations R and S and returns a relation in which every element is either a member of R or S.
to be valid the following conditions have to be met.
(i) R and S must have the same [arity]https://en.wikipedia.org/wiki/Arity)(same number of columns)
(ii) The attribute domains must be the same. That is, the corresponding columns must have the same data type.
If we have two relations, Sciences and Humanities and we want to find the courses offered in both we could run this query
SELECT courseName FROM Sciences UNION SELECT courseName FROM Humanities;
It will return a relation with only the course names from both Sciences and Humanities
This takes in two relations R and S and will return all the rows in R but not in S.
The shaded areas represent respectively
Similar to Unions, set differences must be taken between relations with the same arity.
In this situation we have to assume that the attributes of R and S are disjoint, that is . If they are not disjoint then renaming must be used.
The cartesian product will concatenate each row of R with each row of S. The resultant relation will have #rows_in_R x #rows_in_S. (If R has 2 rows, and S has 3 rows then the result will have 3x2=6 rows.)
It allows us to rename and therefore refer to the result of a relational algebra expression.
This will return the expression E under the name X.
Similarly if an expression E has an arity n, then
And this will return the result of expression E under the name X with the attributes renamed to
Note There are additional operators that do not add to the power of relational algebra, but help to simplify queries. These operators are compositions of the 6 basic operators. They are:
- Set Intersection
- Natural Join
- Assignment. I will explain these in a later post and link it here once I do so.
That's all folks
I'm trying to sharpen my database skills and as such, I figured I will start from the basics of the basics. More to come. Stay tuned.
Top comments (0)