DBMS 📂 Entity-Relationship (ER) Model · 5 of 5 45 min read

Relational Model & Relational Algebra: Select, Project, Join, Divide

A complete, example-driven guide to the relational model and relational algebra. It explains the structure of relational databases (relations, tuples, attributes, domains, keys) and then walks through every core operation — selection, projection, union, join, and division — with animated diagrams, real before/after tables, and the matching SQL for each.

Section 01

The Story Behind the Relational Model

The Idea That Tamed Data
In 1970, a researcher named E. F. Codd made a radical proposal: stop chasing data through tangled pointers and chains, and instead store everything in plain tables — rows and columns, like a tidy ledger. Anyone could read a table, and a clean piece of algebra could pull any answer out of it.

That proposal became the relational model, the foundation of every database you use today. Its query engine is relational algebra — a small set of operations (select, project, union, join, divide) that take tables in and give tables out. SQL is simply a friendly language sitting on top of this algebra.
💡
The Core Idea in One Line

A relation is a table; relational algebra is a set of operations where every input is a relation and every output is a relation — so results can be fed into more operations, like links in a chain.


Section 02

Structure of a Relational Database

A relation (table) is built from a few precise parts. Learn their formal names once and every textbook definition becomes clear.

📊 Anatomy of a Relation
Roll Name Age 101 Raj 22 102 Sara 19 103 Amit 24 STUDENT (relation name) Attribute (column) Tuple (row) Domain of Age = integers 0..150 Degree = 3 (no. of attributes) Cardinality = 3 (no. of tuples)

A relation = a set of tuples sharing the same attributes. Because it is a set, no two tuples are identical and order does not matter.

Formal termEveryday termMeaning
RelationTableA named set of rows with fixed columns
TupleRow / recordOne entry in the relation
AttributeColumn / fieldA named property
DomainData type / allowed valuesThe set of legal values for an attribute
DegreeNumber of columnsCount of attributes
CardinalityNumber of rowsCount of tuples

Section 03

Schema, Instance & Keys

Two more ideas complete the structure. The schema is the permanent design; the instance is the data in it right now.

📄 Schema (the design)
The structure: table name + attributes + types
Written as STUDENT(Roll, Name, Age, Dept)
Changes rarely
📋 Instance (the data)
The actual set of tuples at a moment
The rows {101 Raj 22 CSE, 102 Sara 19 ECE…}
Changes every insert, update, delete

A key guarantees each tuple can be uniquely identified. The hierarchy in brief:

KeyDefinitionExample
Super keyAny attribute set that is unique (may have extras){Roll}, {Roll, Name}
Candidate keyA minimal super key{Roll}, {Email}
Primary keyThe chosen candidate keyRoll
Foreign keyRefers to another relation's primary keyENROLLED.Roll → STUDENT.Roll

Section 04

Relational Algebra — The Query Engine

Relational algebra is a procedural query language: you specify the operations and their order. Every operator consumes one or two relations and produces a new relation — a property called closure, which lets you nest operations freely.

⚙️ Relations In → Operator → Relation Out
Relation(s) operator Relation

Because the output is itself a relation, operators stack: the result of a join can be fed into a selection, then a projection.

OperationSymbolTypeIn one line
SelectionσUnaryPick rows by a condition
ProjectionπUnaryPick columns
UnionSetAll rows of either relation
IntersectionSetRows in both
DifferenceSetRows in one but not the other
JoinBinaryCombine related rows
Division÷BinaryRows matching all of another set

The example tables used throughout the rest of this tutorial:

STUDENT — Roll (PK)NameAgeDept
101Raj22CSE
102Sara19ECE
103Amit24CSE
104Neha20ECE

Section 05

Selection ( σ ) — Pick the Rows

Selection keeps only the tuples that satisfy a condition — it slices the table horizontally. Written as σcondition(R). The structure of the table stays the same; only matching rows survive.

σ Keep rows where Age > 20
σ Age > 20 (STUDENT) Roll Name Age 101Raj22 102Sara19 103Amit24 104Neha20 kept ✔ kept ✔

Green rows pass the test; struck-through rows are filtered out. Selection never changes the columns.

Result of σAge > 20(STUDENT)
RollNameAgeDept
101Raj22CSE
103Amit24CSE
-- SQL equivalent of selection: the WHERE clause
SELECT * FROM STUDENT WHERE Age > 20;

Section 06

Projection ( π ) — Pick the Columns

Projection keeps only the listed columns — it slices the table vertically. Written as πcolumns(R). Crucially, because a relation is a set, projection removes duplicate rows from the result.

π Keep only the Dept column — duplicates collapse
π Dept (STUDENT) Roll Name Dept 101RajCSE 102SaraECE 103AmitCSE 104NehaECE dedup Dept CSE ECE only 2 rows

Four students live in just two departments — projection on Dept returns CSE, ECE, not four duplicate rows.

-- Projection = choosing columns; DISTINCT mirrors duplicate removal
SELECT DISTINCT Dept FROM STUDENT;
SELECT DISTINCT Name, Dept FROM STUDENT;   -- π Name, Dept (STUDENT)

Section 07

Union ( ∪ ) — Combine the Rows

Union stacks all tuples of two relations into one, removing duplicates. It works only if the two relations are union-compatible: the same number of attributes, in the same order, with matching domains.

∪ Cricket players ∪ Chess players
CRICKET Raj Sara CHESS Sara Amit PLAYERS Raj Sara Amit Sara appears once, not twice

The shared name Sara is kept only once — union eliminates duplicates automatically.

-- UNION removes duplicates; UNION ALL keeps them
SELECT Name FROM CRICKET
UNION
SELECT Name FROM CHESS;
💡
The Two Cousins of Union

Intersection (∩) returns rows in both relations (here, just Sara), and Difference (−) returns rows in the first but not the second (Cricket − Chess = Raj). Both also need union-compatibility.


Section 08

Join ( ⋈ ) — Combine Related Rows

Join stitches together tuples from two relations that share a matching value. The natural join automatically matches on the common attribute (here, Roll) and merges the rows into one wider tuple.

STUDENT
RollNameAgeDept
101Raj22CSE
103Amit24CSE
ENROLLED
RollCID
101C1
101C2
103C1
⋈ STUDENT ⋈ ENROLLED — matched on Roll
101 Raj, 22 101 C1 same Roll 101 Raj, 22 CSE C1 one combined tuple (Roll appears once)

A natural join is a Cartesian product followed by a selection on equal common attributes, keeping one copy of the shared column.

Result of STUDENT ⋈ ENROLLED
RollNameAgeCID
101Raj22C1
101Raj22C2
103Amit24C1
-- Natural join matches on the common column automatically
SELECT * FROM STUDENT NATURAL JOIN ENROLLED;

-- Equivalent explicit (equi) join
SELECT * FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll;
🧮
Theta, Equi & Natural Joins

A theta join (⋈θ) joins on any condition; when that condition is equality it is an equi join; a natural join is an equi join on all common attributes that then drops the duplicate column.


Section 09

Division ( ÷ ) — The "For All" Operator

"Which Students Took Every Course?"
The dean asks for the students who have enrolled in all the courses on offer — not some, not most, but every single one. Selection and join can find students in a course; only division can express "matches all of a set." Whenever a question says "for all" or "every", reach for the division operator.

Division takes ENROLLED(Roll, CID) ÷ COURSE(CID) and returns the Roll values that are paired with every CID present in COURSE.

ENROLLED
RollCID
101C1
101C2
102C1
103C1
103C2
COURSE (the divisor)
CID
C1
C2
÷ Who is enrolled in both C1 and C2?
Required set = { C1, C2 } 101 → C1, C2 has all ✔ 102 → C1 missing C2 ✘ 103 → C1, C2 has all ✔ RESULT (Roll) 101 103 Only students paired with every required CID survive division

101 and 103 each appear with both C1 and C2, so they qualify; 102 lacks C2 and is dropped.

-- Division via GROUP BY / HAVING: enrolled in as many courses as exist
SELECT Roll FROM ENROLLED
WHERE CID IN (SELECT CID FROM COURSE)
GROUP BY Roll
HAVING COUNT(DISTINCT CID) = (SELECT COUNT(*) FROM COURSE);
⚠️
The Classic "Double NOT EXISTS"

Division also maps to a double-negation in SQL: "students for whom there is no course that they are not enrolled in." Two nested NOT EXISTS clauses express this — a famous and tricky pattern worth memorising.


Section 10

Putting It Together — Nesting Operations

Closure lets us chain operators into one expression. Suppose we want the names of CSE students older than 20 who are enrolled in something:

🔗 A three-stage relational-algebra pipeline
STUDENT ⋈ ENROLLED join σ Age>20 ∧ Dept='CSE' select π Name project Names

Read it inside-out: join first, then filter rows, then keep only the Name column.

As a single relational-algebra expression and its SQL twin:

-- Relational algebra:
-- π Name ( σ Age>20 ∧ Dept='CSE' ( STUDENT ⋈ ENROLLED ) )

SELECT DISTINCT S.Name
FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll
WHERE S.Age > 20 AND S.Dept = 'CSE';

Section 11

Combining Operators — Rules & a Worked Example

Because every operator returns a relation, you can chain them into one query. But chaining safely needs two kinds of rules: precedence (which operator binds first) and equivalence rules (legal ways to rearrange a combined expression).

A. Operator Precedence

When parentheses are absent, operators apply in this order. Unary operators bind tightest; union and difference are loosest.

PriorityOperatorsGroup
Highestσ  π  ρUnary (selection, projection, rename)
Then×  ⋈Cartesian product, join
ThenIntersection
Lowest∪  −Union, difference
💡
When in Doubt, Parenthesise

Parentheses always override precedence and make intent obvious. σ(R ∪ S) is very different from σ(R) ∪ S — never leave it to the reader to guess.

B. Two Ways to Write a Combined Query

The same query can be one nested expression, or a sequence of steps using assignment (←) to name intermediate relations. Both compute identical results — pick whichever reads more clearly.

🟪 Nested (one expression)
πName( σAge>20(STUDENT) ⋈ ENROLLED )
Compact; read it inside-out
Best for short queries
🧮 Sequence (assignment ←)
T1 ← σAge>20(STUDENT)
T2 ← T1 ⋈ ENROLLED
Result ← πName(T2)

C. The Combine (Equivalence) Rules

These laws let you rewrite a combined expression without changing its result. They are the foundation of query optimization — the engine reshuffles operators to run faster.

RuleEquivalenceWhy it helps
Cascade of σσc1 ∧ c2(R) ≡ σc1c2(R))Split or merge conditions
Commute σσc1c2(R)) ≡ σc2c1(R))Apply the cheaper filter first
Cascade of ππL1L2(R)) ≡ πL1(R), if L1 ⊆ L2Drop redundant projections
Product + σ = Joinσθ(R × S) ≡ R ⋈θ STurn a slow product into a join
Selection push-downσc(R ⋈ S) ≡ σc(R) ⋈ S  (c on R only)Filter before joining → smaller join
Commute ⋈ / ∪R ⋈ S ≡ S ⋈ R   R ∪ S ≡ S ∪ RReorder for a better plan
The Golden Optimization Move: Filter Early

Selection push-down is the single most valuable rule. Joining two large tables and then filtering wastes work; filtering each table first shrinks the rows the join must process. Same answer, far less effort.

D. A Worked Example Using Many Operators

"Names of ECE Students, Plus Older Enrolled Students"
The office wants a single list of student names who are either in the ECE department, or are over 20 and enrolled in at least one course. One question, but it needs selection, join, projection, and union working together.

Using ENROLLED = {(101,C1), (101,C2), (103,C1)}, here is the full combined expression:

-- Relational algebra (nested):
--   π Name ( σ Dept='ECE' (STUDENT) )
--   ∪
--   π Name ( σ Age>20 (STUDENT) ⋈ ENROLLED )
🔗 Two branches merging through a union
STUDENT σ Dept='ECE' π Name STUDENT σ Age>20 ⋈ ENROLLED π Name Result Branch 1 (ECE names) ∪ Branch 2 (older enrolled names) → final list

Each branch is an independent pipeline; the union merges their name lists and removes any duplicates.

Step by step, with the example data:

🧮 Evaluating the Expression
Step 1
σDept='ECE'(STUDENT) = {Sara, Neha} → π Name → {Sara, Neha}
Step 2
σAge>20(STUDENT) = {Raj (101), Amit (103)}
Step 3
Join with ENROLLED keeps only enrolled ones → Raj, Amit (both appear) → π Name → {Raj, Amit}
Step 4
Union: {Sara, Neha} ∪ {Raj, Amit} = {Sara, Neha, Raj, Amit}
Final Result — Name
Sara
Neha
Raj
Amit
-- The same combined query in SQL
SELECT Name FROM STUDENT WHERE Dept = 'ECE'
UNION
SELECT DISTINCT S.Name
FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll
WHERE S.Age > 20;
🧮
Four Operators, One Answer

This query combined selection (two filters), join, projection (two name-lists), and union. Thanks to closure, they slot together cleanly — and by the push-down rule, doing each selection before the join keeps it fast.


Section 12

Relational Algebra ↔ SQL Cheat Sheet

OperationAlgebraSQL
SelectionσAge>20(R)WHERE Age > 20
ProjectionπName(R)SELECT DISTINCT Name
UnionR ∪ SSELECT… UNION SELECT…
IntersectionR ∩ SINTERSECT
DifferenceR − SEXCEPT / MINUS
JoinR ⋈ SJOIN… ON / NATURAL JOIN
DivisionR ÷ SGROUP BY… HAVING COUNT / double NOT EXISTS

Section 13

Common Mistakes to Avoid

⚠️
Mistake 1 — Confusing selection with projection

Selection (σ) keeps rows; projection (π) keeps columns. The names feel swapped because "select" in SQL actually does projection — in relational algebra they are opposites.

⚠️
Mistake 2 — Forgetting projection removes duplicates

Algebra treats relations as sets, so π silently drops duplicate rows. In SQL you must add DISTINCT to reproduce that behaviour.

⚠️
Mistake 3 — Union without union-compatibility

You cannot union a 2-column relation with a 3-column one, nor mix a Name column with an Age column. Same arity, same order, same domains — or the operation is undefined.


Section 14

Golden Rules of Relational Algebra

🏆 Relational Algebra — Non-Negotiable Rules
1
Every operator takes relations and returns a relation. This closure is what lets you nest operations into one query.
2
σ works on rows, π works on columns. Selection is horizontal, projection is vertical.
3
Results are sets, so duplicates vanish. Projection and union both eliminate repeated tuples automatically.
4
Set operations need union-compatibility. Union, intersection, and difference require identical structure.
5
Join needs a common attribute; division answers "for all". Use join to combine, division when the question says "every".
6
Relational algebra is the blueprint for SQL. Knowing the algebra makes every SQL query easier to reason about and optimise.
You have completed Entity-Relationship (ER) Model. View all sections →