Data Models & Three-Schema Architecture in DBMS

Section 01

The Story That Explains Data Models

📖 Real World Analogy

Three Ways to Draw the Same Family

Imagine you must record the people in a large joint family so anyone can look them up. Cousin A draws a family tree — one ancestor at the top, branches fanning downward, every person having exactly one parent above them. Clean, but it cannot show that two cousins married into the family from two different sides.

Cousin B says "people connect in many directions" and draws a web of arrows — anyone can link to anyone. It captures every relationship, but after fifty people the diagram looks like a plate of noodles, and finding one person means tracing arrows by hand.

Cousin C shrugs and makes a simple table: one row per person, columns for name, parent, and spouse. Relationships are just matching values between rows. Boring to look at — but trivial to search, change, and reason about.

Those three cousins just reinvented the hierarchical, network, and relational data models. This tutorial is the story of why Cousin C won.

A data model is the lens through which we describe data, its relationships, its meaning, and the rules it must obey. Choosing a model shapes everything — how you store data, how you query it, and how painful change will be later. Let's build the whole picture from the ground up.

💡

The Core Idea

A data model is not the data itself — it is the blueprint for how data is structured and related. The same real-world facts can be modelled as a tree, a graph, or a set of tables, and the choice has consequences for decades.

Section 02

What Exactly Is a Data Model?

Formally, a data model is a collection of concepts used to describe the structure of a database — the data types, relationships, and constraints — plus a set of operations for retrieving and updating that data. Data models are grouped by how close they sit to the machine.

🧠

High-Level (Conceptual)

close to how users think

Describe data using entities, attributes, and relationships in business terms. The ER model is the classic example. Used during design, before worrying about storage.

📄

Representational (Implementation)

close to how the DBMS works

Still understandable by users, but maps onto real DBMS structures. The relational, hierarchical, and network models all live here — they are record-based representational models.

💾

Low-Level (Physical)

close to the disk

Describe how data is physically stored — record formats, orderings, access paths, and indexes. Meant for system specialists, hidden from ordinary users.

📚 Two Words You Must Never Confuse

Schema

The description of the database — its structure and rules. Defined once, changes rarely. Think of it as the blueprint of a building.

Instance

The actual data in the database at a particular moment — also called the database state. Changes every time you insert, update, or delete. Think of the people currently inside the building.

Relationship

One schema, infinitely many possible instances over time. The blueprint stays fixed while occupants come and go.

Section 03

The Three Record-Based Models — A Map

Three representational models defined the history of databases. They appeared in roughly this order, each fixing a weakness of the one before.

Hierarchical Model (1960s)

Data as a tree of parent–child links. Pioneered by IBM's IMS for the Apollo program. Simple and fast for one-to-many, but rigid.

Network Model (late 1960s–70s)

Data as a graph of owner–member sets (CODASYL / DBTG standard). Handles many-to-many, but the pointer maze is hard to navigate and maintain.

Relational Model (1970 → today)

Data as simple tables linked by matching values, proposed by E. F. Codd. No pointers, declarative SQL, strong theory. It won — and still dominates.

Section 04

The Hierarchical Model

In the hierarchical model, data is organized as a tree. Each record (except the root) has exactly one parent, and a parent may have many children — a strict one-to-many (1:N) relationship. To find any record you must travel down from the root.

🌳 Hierarchical Model — A Tree of Records

■ Root ■ Parent (1:N) ■ Child records ● Access pointer (top-down only)

A department has many courses (1:N), but a course belongs to exactly one department. To reach "DBMS" the DBMS engine walks University → CSE → DBMS.

✅

Strengths

simple & fast

Conceptually simple, fast for 1:N navigation, enforces parent-child integrity automatically.

Great when data is naturally tree-shaped (org charts, file systems, XML).

❌

Weaknesses

rigid structure

Cannot represent many-to-many naturally. A child cannot have two parents without duplicating data.

Structural changes force application rewrites; redundancy creeps in.

🏢

Real Example

IBM IMS

IBM's Information Management System (IMS), built for NASA's Apollo program, is the classic hierarchical DBMS — still running in some banks today.

⚠️

The Many-to-Many Problem

What if a course is taught across two departments, or a student belongs to two clubs? A pure tree cannot express this. The hierarchical model forces you to duplicate the record under each parent — reintroducing exactly the redundancy databases were meant to kill.

Section 05

The Network Model

The network model fixes the tree's biggest flaw. Instead of a tree, data is a graph: a record can have many parents (owners) and many children (members), so many-to-many (M:N) relationships are expressed directly. Relationships are called sets, standardized by the CODASYL DBTG committee.

🕸️ Network Model — A Graph of Owner–Member Sets

■ Owner records (Courses) ■ Member records (Students) ● Set links (M:N)

Lines cross freely: one student links to several courses and one course links to several students. This direct M:N support is the network model's whole reason for existing.

✅

Strengths

flexible relationships

Models M:N directly, navigates fast via pointers, less redundancy than hierarchical.

Powerful for richly interconnected data.

❌

Weaknesses

pointer maze

Very complex to design and maintain. Programs must navigate physical pointers record by record.

Almost no data independence — structure changes break programs.

🏢

Real Example

IDMS / CODASYL

The CODASYL DBTG standard and products like IDMS implemented the network model on mainframes.

Section 06

The Relational Model

In 1970, E. F. Codd proposed something radical: forget pointers and trees — store everything as relations (tables). A relation is a set of tuples (rows), each with the same attributes (columns). Relationships are not pointers but shared values — a value in one table that matches a key in another.

📖 Story

The Idea That Changed Everything

Codd's insight was almost insultingly simple: people already understand tables. A spreadsheet of students, a spreadsheet of courses, and a third linking the two by matching IDs can express any tree or graph — with rock-solid mathematics (set theory and relational algebra) underneath. No navigation, no pointers: you just describe what you want and the DBMS figures out how to get it. That declarative leap gave us SQL, and the relational model never looked back.

The Same Data, As Relations

📝 STUDENT relation

roll	name	dept
1	Riya	CSE
2	Aman	ECE
3	Neha	CSE

📝 ENROLL relation (links by roll)

roll	course	grade
1	DBMS	A
1	OS	B
3	DBMS	A

The M:N relationship that needed a pointer maze in the network model is here expressed by a plain matching value: ENROLL.roll matches STUDENT.roll. No pointers, no duplication of the student's name.

🔑 Core Relational Vocabulary

Relation

A table, e.g. STUDENT.

Tuple

A row of the table — one student.

Attribute

A column, e.g. name.

Domain

The set of allowed values for an attribute (e.g. dept ∈ {CSE, ECE, ...}).

Degree

Number of attributes (columns) in the relation.

Cardinality

Number of tuples (rows) currently in the relation.

Primary Key

Attribute(s) that uniquely identify each tuple, e.g. roll.

Foreign Key

An attribute referencing a key in another relation, e.g. ENROLL.roll.

🏆

Why It Won

The relational model offers simplicity (just tables), a rigorous mathematical foundation, declarative querying via SQL, and the strongest data independence of the three. Hierarchical and network are now mostly history; relational (and its NoSQL descendants) runs the modern world.

Section 07

Hierarchical vs Network vs Relational

Property	Hierarchical	Network	Relational
Structure	Tree	Graph	Tables (relations)
Relationship support	1:N only	1:N and M:N	1:1, 1:N, M:N via keys
Links implemented by	Physical pointers	Physical pointers	Matching values
Access method	Navigate from root	Navigate pointers	Declarative (SQL)
Data independence	Very low	Low	High
Redundancy	High (duplication)	Moderate	Controlled
Ease of use	Moderate	Hard (complex)	Easy
Theoretical basis	None formal	None formal	Set theory / relational algebra
Example systems	IBM IMS	IDMS, CODASYL	MySQL, PostgreSQL, Oracle
Era	1960s	Late 1960s–70s	1970 → today

🔎

One Line To Remember

Hierarchical = tree of pointers, Network = graph of pointers, Relational = tables linked by values. Pointers tie data to storage; values set it free.

Section 08

The Three-Schema Architecture

Knowing the relational model is great — but how does a DBMS keep users insulated from storage details? The answer is the ANSI/SPARC Three-Schema Architecture (1975), which separates a database description into three levels of abstraction. This separation is the engine that makes data independence possible.

🧱 ANSI/SPARC Three-Schema Architecture

■ External level (user views) ■ Conceptual level (whole DB) ■ Internal level (storage)

Three users see three different external views. All map to one conceptual schema, which in turn maps to one internal storage scheme. Users never touch the green layer directly.

👁️

External Level

views / subschemas

Many external schemas (views), one per user group. Each describes only the part of the database that user needs and hides the rest. The clerk never sees salary columns.

🌐

Conceptual Level

the whole community view

One conceptual schema describing the entire database for the whole organization — all entities, relationships, and constraints — but hiding physical storage details.

💾

Internal Level

physical storage

One internal schema describing how data is physically stored: file structures, record placement, indexes, and access paths. Closest to the hardware.

Section 09

The Two Mappings

The three levels are glued together by two mappings. These mappings are what let the DBMS translate a user's request, expressed against a view, all the way down to bytes on disk — and the answer back up.

🔗 How a Request Travels Through the Levels

Mapping 1

External / Conceptual: connects each user view to the conceptual schema. Lets the DBMS rewrite a query on a view into a query on the full database.

Mapping 2

Conceptual / Internal: connects the conceptual schema to the internal (storage) schema. Lets the DBMS find where on disk the requested data actually lives.

Result

A query written against a simple view is silently translated down two levels, executed on the stored bytes, and the result is mapped back up — all invisible to the user.

Section 10

Data Independence — The Whole Point

Data independence is the capacity to change the schema at one level without having to change the schema at the next higher level. The three-schema architecture exists precisely to deliver it. There are two kinds.

🛡️ Logical vs Physical Data Independence

■ External ■ Conceptual ■ Internal 🛡️ change absorbed here

The shields sit on the mappings. Adjust the mapping and a lower-level change stops there, never rippling upward to break user programs.

🔌

Logical Data Independence

conceptual changes → views safe

The ability to change the conceptual schema (add a table or column, split a relation) without changing external views or application programs. Harder to achieve in practice.

e.g. add an email column — old views still work.

💾

Physical Data Independence

storage changes → conceptual safe

The ability to change the internal schema (add an index, reorganize files, move to a new disk) without changing the conceptual schema. Easier and very common.

e.g. add an index — queries just run faster, nothing else changes.

🎯

Why It Matters

change without breakage

Data independence is the cure for the file system's fatal flaw. Programs survive storage and structure changes, so the database can evolve for decades without endless rewrites.

📐

Easy Mnemonic

Logical independence protects against Logical (conceptual) changes. Physical independence protects against Physical (storage) changes. The protected level above never has to care.

Section 11

Schema vs Instance — A Concrete Look

📄 SCHEMA (the blueprint, rarely changes)

STUDENT(roll, name, dept)

ENROLL(roll, course, grade)

constraint: ENROLL.roll → STUDENT.roll

constraint: dept ∈ {CSE, ECE, ME}

📊 INSTANCE (the data right now, changes often)

(1, Riya, CSE), (2, Aman, ECE), (3, Neha, CSE)

(1, DBMS, A), (1, OS, B), (3, DBMS, A)

… tomorrow a new student is added …

… the schema above stays exactly the same.

Section 12

Hands-On — Relational Model & Data Independence in Python

Let's make the abstract concrete. Using sqlite3, we build the relational tables, create a view (an external schema), then demonstrate logical and physical data independence — proving the view survives change.

Building the Relational Schema and an External View

import sqlite3

conn = sqlite3.connect(':memory:')   # temporary in-memory database
cur  = conn.cursor()

# --- CONCEPTUAL SCHEMA: the whole database, as relations ---
cur.executescript("""
    CREATE TABLE student (
        roll INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        dept TEXT NOT NULL
    );
    CREATE TABLE enroll (
        roll   INTEGER,
        course TEXT,
        grade  TEXT,
        FOREIGN KEY (roll) REFERENCES student(roll)
    );
""")

cur.executemany("INSERT INTO student VALUES (?,?,?)",
    [(1, 'Riya', 'CSE'), (2, 'Aman', 'ECE'), (3, 'Neha', 'CSE')])
cur.executemany("INSERT INTO enroll VALUES (?,?,?)",
    [(1, 'DBMS', 'A'), (1, 'OS', 'B'), (3, 'DBMS', 'A')])

# --- EXTERNAL SCHEMA: a view for the CSE faculty (only what they need) ---
cur.execute("""
    CREATE VIEW cse_grades AS
    SELECT s.name, e.course, e.grade
    FROM   student s JOIN enroll e ON s.roll = e.roll
    WHERE  s.dept = 'CSE'
""")

print("CSE faculty view:")
for row in cur.execute("SELECT * FROM cse_grades"):
    print("  ", row)

OUTPUT

CSE faculty view: ('Riya', 'DBMS', 'A') ('Riya', 'OS', 'B') ('Neha', 'DBMS', 'A')

Proving Logical Data Independence

Now change the conceptual schema — add a brand-new column. The faculty's external view does not mention the new column, so it keeps working unchanged.

# Change the CONCEPTUAL schema: add a column the view never referenced
cur.execute("ALTER TABLE student ADD COLUMN email TEXT")

# The external view is untouched — logical data independence in action
print("Same view, after adding student.email:")
for row in cur.execute("SELECT * FROM cse_grades"):
    print("  ", row)

OUTPUT

Same view, after adding student.email: ('Riya', 'DBMS', 'A') ('Riya', 'OS', 'B') ('Neha', 'DBMS', 'A')

Proving Physical Data Independence

# Change the INTERNAL schema: add an index (a pure storage decision)
cur.execute("CREATE INDEX idx_enroll_roll ON enroll(roll)")

# Conceptual schema and query results are IDENTICAL — only speed changed
print("Query result is unchanged; only the access path improved.")
conn.close()

OUTPUT

Query result is unchanged; only the access path improved.

🎯

What You Just Witnessed

Adding a column (a conceptual change) did not break the view — that is logical data independence. Adding an index (an internal change) did not alter results — that is physical data independence. The three-schema architecture made both possible.

Section 13

Bringing It Together

📖 The Story, Resolved

Why Cousin C Sleeps Soundly

Cousin A's tree (hierarchical) and Cousin B's web (network) both hard-wired relationships into pointers — change the structure and every program shatters. Cousin C's tables (relational) linked data by values, and the three-schema architecture wrapped those tables in three layers of abstraction.

Now the accountant, the clerk, and the faculty each get their own external view. The organization keeps one conceptual schema. The DBA tunes the internal storage freely. A column can be added, an index built, a disk swapped — and nobody's program breaks. That quiet resilience is the modern database, and it is built on exactly the ideas in this tutorial.

Section 14

Golden Rules

📚 Data Models & Architecture — Key Takeaways

A data model describes structure, relationships, constraints, and operations. Models range from high-level (ER) to representational (relational, network, hierarchical) to low-level (physical).

Hierarchical = tree, 1:N only, navigate from the root. Simple but rigid and cannot do M:N without duplication (IBM IMS).

Network = graph of owner–member sets, supports M:N, but is a complex pointer maze with poor data independence (CODASYL / IDMS).

Relational = tables linked by matching values, with a rigorous mathematical basis and declarative SQL. It offers the strongest data independence and dominates today (Codd, 1970).

The three-schema architecture separates external (user views), conceptual (whole DB), and internal (storage) levels, glued by two mappings.

Logical data independence: change the conceptual schema without breaking external views. Physical data independence: change storage without touching the conceptual schema.

Schema is the blueprint (changes rarely); an instance is the data at a moment (changes constantly). One schema, infinitely many instances over time.