Relational Algebra: Select, Project, Join, Divide

Section 01

The Story Behind the Relational Model

📖 A Bit of History

The Idea That Tamed Data

In 1970, a researcher named E. F. Codd made a radical proposal: stop chasing data through tangled pointers and chains, and instead store everything in plain tables — rows and columns, like a tidy ledger. Anyone could read a table, and a clean piece of algebra could pull any answer out of it.

That proposal became the relational model, the foundation of every database you use today. Its query engine is relational algebra — a small set of operations (select, project, union, join, divide) that take tables in and give tables out. SQL is simply a friendly language sitting on top of this algebra.

💡

The Core Idea in One Line

A relation is a table; relational algebra is a set of operations where every input is a relation and every output is a relation — so results can be fed into more operations, like links in a chain.

Section 02

Structure of a Relational Database

A relation (table) is built from a few precise parts. Learn their formal names once and every textbook definition becomes clear.

📊 Anatomy of a Relation

A relation = a set of tuples sharing the same attributes. Because it is a set, no two tuples are identical and order does not matter.

Formal term	Everyday term	Meaning
Relation	Table	A named set of rows with fixed columns
Tuple	Row / record	One entry in the relation
Attribute	Column / field	A named property
Domain	Data type / allowed values	The set of legal values for an attribute
Degree	Number of columns	Count of attributes
Cardinality	Number of rows	Count of tuples

Section 03

Schema, Instance & Keys

Two more ideas complete the structure. The schema is the permanent design; the instance is the data in it right now.

📄 Schema (the design)

The structure: table name + attributes + types

Written as STUDENT(Roll, Name, Age, Dept)

Changes rarely

📋 Instance (the data)

The actual set of tuples at a moment

The rows {101 Raj 22 CSE, 102 Sara 19 ECE…}

Changes every insert, update, delete

A key guarantees each tuple can be uniquely identified. The hierarchy in brief:

Key	Definition	Example
Super key	Any attribute set that is unique (may have extras)	{Roll}, {Roll, Name}
Candidate key	A minimal super key	{Roll}, {Email}
Primary key	The chosen candidate key	Roll
Foreign key	Refers to another relation's primary key	ENROLLED.Roll → STUDENT.Roll

Section 04

Relational Algebra — The Query Engine

Relational algebra is a procedural query language: you specify the operations and their order. Every operator consumes one or two relations and produces a new relation — a property called closure, which lets you nest operations freely.

⚙️ Relations In → Operator → Relation Out

Because the output is itself a relation, operators stack: the result of a join can be fed into a selection, then a projection.

Operation	Symbol	Type	In one line
Selection	σ	Unary	Pick rows by a condition
Projection	π	Unary	Pick columns
Union	∪	Set	All rows of either relation
Intersection	∩	Set	Rows in both
Difference	−	Set	Rows in one but not the other
Join	⋈	Binary	Combine related rows
Division	÷	Binary	Rows matching all of another set

The example tables used throughout the rest of this tutorial:

STUDENT — Roll (PK)	Name	Age	Dept
101	Raj	22	CSE
102	Sara	19	ECE
103	Amit	24	CSE
104	Neha	20	ECE

Section 05

Selection ( σ ) — Pick the Rows

Selection keeps only the tuples that satisfy a condition — it slices the table horizontally. Written as σ_condition(R). The structure of the table stays the same; only matching rows survive.

σ Keep rows where Age > 20

Green rows pass the test; struck-through rows are filtered out. Selection never changes the columns.

Result of σ_{Age > 20}(STUDENT)
Roll	Name	Age	Dept
101	Raj	22	CSE
103	Amit	24	CSE

-- SQL equivalent of selection: the WHERE clause
SELECT * FROM STUDENT WHERE Age > 20;

Section 06

Projection ( π ) — Pick the Columns

Projection keeps only the listed columns — it slices the table vertically. Written as π_columns(R). Crucially, because a relation is a set, projection removes duplicate rows from the result.

π Keep only the Dept column — duplicates collapse

Four students live in just two departments — projection on Dept returns CSE, ECE, not four duplicate rows.

-- Projection = choosing columns; DISTINCT mirrors duplicate removal
SELECT DISTINCT Dept FROM STUDENT;
SELECT DISTINCT Name, Dept FROM STUDENT;   -- π Name, Dept (STUDENT)

Section 07

Union ( ∪ ) — Combine the Rows

Union stacks all tuples of two relations into one, removing duplicates. It works only if the two relations are union-compatible: the same number of attributes, in the same order, with matching domains.

∪ Cricket players ∪ Chess players

The shared name Sara is kept only once — union eliminates duplicates automatically.

-- UNION removes duplicates; UNION ALL keeps them
SELECT Name FROM CRICKET
UNION
SELECT Name FROM CHESS;

💡

The Two Cousins of Union

Intersection (∩) returns rows in both relations (here, just Sara), and Difference (−) returns rows in the first but not the second (Cricket − Chess = Raj). Both also need union-compatibility.

Section 08

Join ( ⋈ ) — Combine Related Rows

Join stitches together tuples from two relations that share a matching value. The natural join automatically matches on the common attribute (here, Roll) and merges the rows into one wider tuple.

STUDENT
Roll	Name	Age	Dept
101	Raj	22	CSE
103	Amit	24	CSE

ENROLLED
Roll	CID
101	C1
101	C2
103	C1

⋈ STUDENT ⋈ ENROLLED — matched on Roll

A natural join is a Cartesian product followed by a selection on equal common attributes, keeping one copy of the shared column.

Result of STUDENT ⋈ ENROLLED
Roll	Name	Age	CID
101	Raj	22	C1
101	Raj	22	C2
103	Amit	24	C1

-- Natural join matches on the common column automatically
SELECT * FROM STUDENT NATURAL JOIN ENROLLED;

-- Equivalent explicit (equi) join
SELECT * FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll;

🧮

Theta, Equi & Natural Joins

A theta join (⋈_θ) joins on any condition; when that condition is equality it is an equi join; a natural join is an equi join on all common attributes that then drops the duplicate column.

Section 09

Division ( ÷ ) — The "For All" Operator

📖 Story

"Which Students Took Every Course?"

The dean asks for the students who have enrolled in all the courses on offer — not some, not most, but every single one. Selection and join can find students in a course; only division can express "matches all of a set." Whenever a question says "for all" or "every", reach for the division operator.

Division takes ENROLLED(Roll, CID) ÷ COURSE(CID) and returns the Roll values that are paired with every CID present in COURSE.

ENROLLED
Roll	CID
101	C1
101	C2
102	C1
103	C1
103	C2

COURSE (the divisor)
CID
C1
C2

÷ Who is enrolled in both C1 and C2?

101 and 103 each appear with both C1 and C2, so they qualify; 102 lacks C2 and is dropped.

-- Division via GROUP BY / HAVING: enrolled in as many courses as exist
SELECT Roll FROM ENROLLED
WHERE CID IN (SELECT CID FROM COURSE)
GROUP BY Roll
HAVING COUNT(DISTINCT CID) = (SELECT COUNT(*) FROM COURSE);

⚠️

The Classic "Double NOT EXISTS"

Division also maps to a double-negation in SQL: "students for whom there is no course that they are not enrolled in." Two nested NOT EXISTS clauses express this — a famous and tricky pattern worth memorising.

Section 10

Putting It Together — Nesting Operations

Closure lets us chain operators into one expression. Suppose we want the names of CSE students older than 20 who are enrolled in something:

🔗 A three-stage relational-algebra pipeline

Read it inside-out: join first, then filter rows, then keep only the Name column.

As a single relational-algebra expression and its SQL twin:

-- Relational algebra:
-- π Name ( σ Age>20 ∧ Dept='CSE' ( STUDENT ⋈ ENROLLED ) )

SELECT DISTINCT S.Name
FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll
WHERE S.Age > 20 AND S.Dept = 'CSE';

Section 11

Combining Operators — Rules & a Worked Example

Because every operator returns a relation, you can chain them into one query. But chaining safely needs two kinds of rules: precedence (which operator binds first) and equivalence rules (legal ways to rearrange a combined expression).

A. Operator Precedence

When parentheses are absent, operators apply in this order. Unary operators bind tightest; union and difference are loosest.

Priority	Operators	Group
Highest	σ π ρ	Unary (selection, projection, rename)
Then	× ⋈	Cartesian product, join
Then	∩	Intersection
Lowest	∪ −	Union, difference

💡

When in Doubt, Parenthesise

Parentheses always override precedence and make intent obvious. σ(R ∪ S) is very different from σ(R) ∪ S — never leave it to the reader to guess.

B. Two Ways to Write a Combined Query

The same query can be one nested expression, or a sequence of steps using assignment (←) to name intermediate relations. Both compute identical results — pick whichever reads more clearly.

🟪 Nested (one expression)

π_Name( σ_Age>20(STUDENT) ⋈ ENROLLED )

Compact; read it inside-out

Best for short queries

🧮 Sequence (assignment ←)

T1 ← σ_Age>20(STUDENT)

T2 ← T1 ⋈ ENROLLED

Result ← π_Name(T2)

C. The Combine (Equivalence) Rules

These laws let you rewrite a combined expression without changing its result. They are the foundation of query optimization — the engine reshuffles operators to run faster.

Rule	Equivalence	Why it helps
Cascade of σ	σ_{c1 ∧ c2}(R) ≡ σ_c1(σ_c2(R))	Split or merge conditions
Commute σ	σ_c1(σ_c2(R)) ≡ σ_c2(σ_c1(R))	Apply the cheaper filter first
Cascade of π	π_L1(π_L2(R)) ≡ π_L1(R), if L1 ⊆ L2	Drop redundant projections
Product + σ = Join	σ_θ(R × S) ≡ R ⋈_θ S	Turn a slow product into a join
Selection push-down	σ_c(R ⋈ S) ≡ σ_c(R) ⋈ S (c on R only)	Filter before joining → smaller join
Commute ⋈ / ∪	R ⋈ S ≡ S ⋈ R R ∪ S ≡ S ∪ R	Reorder for a better plan

⚡

The Golden Optimization Move: Filter Early

Selection push-down is the single most valuable rule. Joining two large tables and then filtering wastes work; filtering each table first shrinks the rows the join must process. Same answer, far less effort.

D. A Worked Example Using Many Operators

📖 The Request

"Names of ECE Students, Plus Older Enrolled Students"

The office wants a single list of student names who are either in the ECE department, or are over 20 and enrolled in at least one course. One question, but it needs selection, join, projection, and union working together.

Using ENROLLED = {(101,C1), (101,C2), (103,C1)}, here is the full combined expression:

-- Relational algebra (nested):
--   π Name ( σ Dept='ECE' (STUDENT) )
--   ∪
--   π Name ( σ Age>20 (STUDENT) ⋈ ENROLLED )

🔗 Two branches merging through a union

Each branch is an independent pipeline; the union merges their name lists and removes any duplicates.

Step by step, with the example data:

🧮 Evaluating the Expression

Step 1

σ_Dept='ECE'(STUDENT) = {Sara, Neha} → π Name → {Sara, Neha}

Step 2

σ_Age>20(STUDENT) = {Raj (101), Amit (103)}

Step 3

Join with ENROLLED keeps only enrolled ones → Raj, Amit (both appear) → π Name → {Raj, Amit}

Step 4

Union: {Sara, Neha} ∪ {Raj, Amit} = {Sara, Neha, Raj, Amit}

Final Result — Name
Sara
Neha
Raj
Amit

-- The same combined query in SQL
SELECT Name FROM STUDENT WHERE Dept = 'ECE'
UNION
SELECT DISTINCT S.Name
FROM STUDENT S JOIN ENROLLED E ON S.Roll = E.Roll
WHERE S.Age > 20;

🧮

Four Operators, One Answer

This query combined selection (two filters), join, projection (two name-lists), and union. Thanks to closure, they slot together cleanly — and by the push-down rule, doing each selection before the join keeps it fast.

Section 12

Relational Algebra ↔ SQL Cheat Sheet

Operation	Algebra	SQL
Selection	σ_Age>20(R)	WHERE Age > 20
Projection	π_Name(R)	SELECT DISTINCT Name
Union	R ∪ S	SELECT… UNION SELECT…
Intersection	R ∩ S	INTERSECT
Difference	R − S	EXCEPT / MINUS
Join	R ⋈ S	JOIN… ON / NATURAL JOIN
Division	R ÷ S	GROUP BY… HAVING COUNT / double NOT EXISTS

Section 13

Common Mistakes to Avoid

⚠️

Mistake 1 — Confusing selection with projection

Selection (σ) keeps rows; projection (π) keeps columns. The names feel swapped because "select" in SQL actually does projection — in relational algebra they are opposites.

⚠️

Mistake 2 — Forgetting projection removes duplicates

Algebra treats relations as sets, so π silently drops duplicate rows. In SQL you must add DISTINCT to reproduce that behaviour.

⚠️

Mistake 3 — Union without union-compatibility

You cannot union a 2-column relation with a 3-column one, nor mix a Name column with an Age column. Same arity, same order, same domains — or the operation is undefined.

Section 14

Golden Rules of Relational Algebra

🏆 Relational Algebra — Non-Negotiable Rules

Every operator takes relations and returns a relation. This closure is what lets you nest operations into one query.

σ works on rows, π works on columns. Selection is horizontal, projection is vertical.

Results are sets, so duplicates vanish. Projection and union both eliminate repeated tuples automatically.

Set operations need union-compatibility. Union, intersection, and difference require identical structure.

Join needs a common attribute; division answers "for all". Use join to combine, division when the question says "every".

Relational algebra is the blueprint for SQL. Knowing the algebra makes every SQL query easier to reason about and optimise.