The Story Behind the Relational Model
That proposal became the relational model, the foundation of every database you use today. Its query engine is relational algebra — a small set of operations (select, project, union, join, divide) that take tables in and give tables out. SQL is simply a friendly language sitting on top of this algebra.
A relation is a table; relational algebra is a set of operations where every input is a relation and every output is a relation — so results can be fed into more operations, like links in a chain.
Structure of a Relational Database
A relation (table) is built from a few precise parts. Learn their formal names once and every textbook definition becomes clear.
A relation = a set of tuples sharing the same attributes. Because it is a set, no two tuples are identical and order does not matter.
| Formal term | Everyday term | Meaning |
|---|---|---|
| Relation | Table | A named set of rows with fixed columns |
| Tuple | Row / record | One entry in the relation |
| Attribute | Column / field | A named property |
| Domain | Data type / allowed values | The set of legal values for an attribute |
| Degree | Number of columns | Count of attributes |
| Cardinality | Number of rows | Count of tuples |
Schema, Instance & Keys
Two more ideas complete the structure. The schema is the permanent design; the instance is the data in it right now.
| The structure: table name + attributes + types |
| Written as STUDENT(Roll, Name, Age, Dept) |
| Changes rarely |
| The actual set of tuples at a moment |
| The rows {101 Raj 22 CSE, 102 Sara 19 ECE…} |
| Changes every insert, update, delete |
A key guarantees each tuple can be uniquely identified. The hierarchy in brief:
| Key | Definition | Example |
|---|---|---|
| Super key | Any attribute set that is unique (may have extras) | {Roll}, {Roll, Name} |
| Candidate key | A minimal super key | {Roll}, {Email} |
| Primary key | The chosen candidate key | Roll |
| Foreign key | Refers to another relation's primary key | ENROLLED.Roll → STUDENT.Roll |
Relational Algebra — The Query Engine
Relational algebra is a procedural query language: you specify the operations and their order. Every operator consumes one or two relations and produces a new relation — a property called closure, which lets you nest operations freely.
Because the output is itself a relation, operators stack: the result of a join can be fed into a selection, then a projection.
| Operation | Symbol | Type | In one line |
|---|---|---|---|
| Selection | σ | Unary | Pick rows by a condition |
| Projection | π | Unary | Pick columns |
| Union | ∪ | Set | All rows of either relation |
| Intersection | ∩ | Set | Rows in both |
| Difference | − | Set | Rows in one but not the other |
| Join | ⋈ | Binary | Combine related rows |
| Division | ÷ | Binary | Rows matching all of another set |
The example tables used throughout the rest of this tutorial:
| STUDENT — Roll (PK) | Name | Age | Dept |
|---|---|---|---|
| 101 | Raj | 22 | CSE |
| 102 | Sara | 19 | ECE |
| 103 | Amit | 24 | CSE |
| 104 | Neha | 20 | ECE |
Selection ( σ ) — Pick the Rows
Selection keeps only the tuples that satisfy a condition — it slices the table horizontally. Written as σcondition(R). The structure of the table stays the same; only matching rows survive.
Green rows pass the test; struck-through rows are filtered out. Selection never changes the columns.
| Result of σAge > 20(STUDENT) | |||
|---|---|---|---|
| Roll | Name | Age | Dept |
| 101 | Raj | 22 | CSE |
| 103 | Amit | 24 | CSE |
-- SQL equivalent of selection: the WHERE clause
SELECT * FROM STUDENT WHERE Age > 20;
Projection ( π ) — Pick the Columns
Projection keeps only the listed columns — it slices the table vertically. Written as πcolumns(R). Crucially, because a relation is a set, projection removes duplicate rows from the result.
Four students live in just two departments — projection on Dept returns CSE, ECE, not four duplicate rows.
-- Projection = choosing columns; DISTINCT mirrors duplicate removal
SELECT DISTINCT Dept FROM STUDENT;
SELECT DISTINCT Name, Dept FROM STUDENT; -- π Name, Dept (STUDENT)
Union ( ∪ ) — Combine the Rows
Union stacks all tuples of two relations into one, removing duplicates. It works only if the two relations are union-compatible: the same number of attributes, in the same order, with matching domains.
The shared name Sara is kept only once — union eliminates duplicates automatically.
-- UNION removes duplicates; UNION ALL keeps them
SELECT Name FROM CRICKET
UNION
SELECT Name FROM CHESS;
Intersection (∩) returns rows in both relations (here, just Sara), and Difference (−) returns rows in the first but not the second (Cricket − Chess = Raj). Both also need union-compatibility.
Join ( ⋈ ) — Combine Related Rows
Join stitches together tuples from two relations that share a matching value. The natural join automatically matches on the common attribute (here, Roll) and merges the rows into one wider tuple.
| STUDENT | |||
|---|---|---|---|
| Roll | Name | Age | Dept |
| 101 | Raj | 22 | CSE |
| 103 | Amit | 24 | CSE |
| ENROLLED | |
|---|---|
| Roll | CID |
| 101 | C1 |
| 101 | C2 |
| 103 | C1 |
A natural join is a Cartesian product followed by a selection on equal common attributes, keeping one copy of the shared column.
| Result of STUDENT ⋈ ENROLLED | |||
|---|---|---|---|
| Roll | Name | Age | CID |
| 101 | Raj | 22 | C1 |
| 101 | Raj | 22 | C2 |
| 103 | Amit | 24 | C1 |
-- Natural join matches on the common column automatically
SELECT * FROM STUDENT NATURAL JOIN ENROLLED;
-- Equivalent explicit (equi) join
SELECT * FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll;
A theta join (⋈θ) joins on any condition; when that condition is equality it is an equi join; a natural join is an equi join on all common attributes that then drops the duplicate column.
Division ( ÷ ) — The "For All" Operator
Division takes ENROLLED(Roll, CID) ÷ COURSE(CID) and returns the Roll values that are paired with every CID present in COURSE.
| ENROLLED | |
|---|---|
| Roll | CID |
| 101 | C1 |
| 101 | C2 |
| 102 | C1 |
| 103 | C1 |
| 103 | C2 |
| COURSE (the divisor) |
|---|
| CID |
| C1 |
| C2 |
101 and 103 each appear with both C1 and C2, so they qualify; 102 lacks C2 and is dropped.
-- Division via GROUP BY / HAVING: enrolled in as many courses as exist
SELECT Roll FROM ENROLLED
WHERE CID IN (SELECT CID FROM COURSE)
GROUP BY Roll
HAVING COUNT(DISTINCT CID) = (SELECT COUNT(*) FROM COURSE);
Division also maps to a double-negation in SQL: "students for whom there is no course that they are not enrolled in." Two nested NOT EXISTS clauses express this — a famous and tricky pattern worth memorising.
Putting It Together — Nesting Operations
Closure lets us chain operators into one expression. Suppose we want the names of CSE students older than 20 who are enrolled in something:
Read it inside-out: join first, then filter rows, then keep only the Name column.
As a single relational-algebra expression and its SQL twin:
-- Relational algebra:
-- π Name ( σ Age>20 ∧ Dept='CSE' ( STUDENT ⋈ ENROLLED ) )
SELECT DISTINCT S.Name
FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll
WHERE S.Age > 20 AND S.Dept = 'CSE';
Combining Operators — Rules & a Worked Example
Because every operator returns a relation, you can chain them into one query. But chaining safely needs two kinds of rules: precedence (which operator binds first) and equivalence rules (legal ways to rearrange a combined expression).
A. Operator Precedence
When parentheses are absent, operators apply in this order. Unary operators bind tightest; union and difference are loosest.
| Priority | Operators | Group |
|---|---|---|
| Highest | σ π ρ | Unary (selection, projection, rename) |
| Then | × ⋈ | Cartesian product, join |
| Then | ∩ | Intersection |
| Lowest | ∪ − | Union, difference |
Parentheses always override precedence and make intent obvious. σ(R ∪ S) is very different from σ(R) ∪ S — never leave it to the reader to guess.
B. Two Ways to Write a Combined Query
The same query can be one nested expression, or a sequence of steps using assignment (←) to name intermediate relations. Both compute identical results — pick whichever reads more clearly.
| πName( σAge>20(STUDENT) ⋈ ENROLLED ) |
| Compact; read it inside-out |
| Best for short queries |
| T1 ← σAge>20(STUDENT) |
| T2 ← T1 ⋈ ENROLLED |
| Result ← πName(T2) |
C. The Combine (Equivalence) Rules
These laws let you rewrite a combined expression without changing its result. They are the foundation of query optimization — the engine reshuffles operators to run faster.
| Rule | Equivalence | Why it helps |
|---|---|---|
| Cascade of σ | σc1 ∧ c2(R) ≡ σc1(σc2(R)) | Split or merge conditions |
| Commute σ | σc1(σc2(R)) ≡ σc2(σc1(R)) | Apply the cheaper filter first |
| Cascade of π | πL1(πL2(R)) ≡ πL1(R), if L1 ⊆ L2 | Drop redundant projections |
| Product + σ = Join | σθ(R × S) ≡ R ⋈θ S | Turn a slow product into a join |
| Selection push-down | σc(R ⋈ S) ≡ σc(R) ⋈ S (c on R only) | Filter before joining → smaller join |
| Commute ⋈ / ∪ | R ⋈ S ≡ S ⋈ R R ∪ S ≡ S ∪ R | Reorder for a better plan |
Selection push-down is the single most valuable rule. Joining two large tables and then filtering wastes work; filtering each table first shrinks the rows the join must process. Same answer, far less effort.
D. A Worked Example Using Many Operators
Using ENROLLED = {(101,C1), (101,C2), (103,C1)}, here is the full combined expression:
-- Relational algebra (nested):
-- π Name ( σ Dept='ECE' (STUDENT) )
-- ∪
-- π Name ( σ Age>20 (STUDENT) ⋈ ENROLLED )
Each branch is an independent pipeline; the union merges their name lists and removes any duplicates.
Step by step, with the example data:
| Final Result — Name |
|---|
| Sara |
| Neha |
| Raj |
| Amit |
-- The same combined query in SQL
SELECT Name FROM STUDENT WHERE Dept = 'ECE'
UNION
SELECT DISTINCT S.Name
FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll
WHERE S.Age > 20;
This query combined selection (two filters), join, projection (two name-lists), and union. Thanks to closure, they slot together cleanly — and by the push-down rule, doing each selection before the join keeps it fast.
Relational Algebra ↔ SQL Cheat Sheet
| Operation | Algebra | SQL |
|---|---|---|
| Selection | σAge>20(R) | WHERE Age > 20 |
| Projection | πName(R) | SELECT DISTINCT Name |
| Union | R ∪ S | SELECT… UNION SELECT… |
| Intersection | R ∩ S | INTERSECT |
| Difference | R − S | EXCEPT / MINUS |
| Join | R ⋈ S | JOIN… ON / NATURAL JOIN |
| Division | R ÷ S | GROUP BY… HAVING COUNT / double NOT EXISTS |
Common Mistakes to Avoid
Selection (σ) keeps rows; projection (π) keeps columns. The names feel swapped because "select" in SQL actually does projection — in relational algebra they are opposites.
Algebra treats relations as sets, so π silently drops duplicate rows. In SQL you must add DISTINCT to reproduce that behaviour.
You cannot union a 2-column relation with a 3-column one, nor mix a Name column with an Age column. Same arity, same order, same domains — or the operation is undefined.