DBMS
📂 Introduction to Databases
· 2 of 2
36 min read
Data Models in DBMS: Hierarchical, Network, Relational, the Three-Schema Architecture & Data Independence
A detailed introduction to how databases describe data. It compares the three record-based data models — hierarchical (tree), network (graph), and relational (tables) — explaining their structures, relationship support, strengths, and weaknesses with diagrams and examples. It then covers the ANSI/SPARC three-schema architecture (external, conceptual, internal levels) and its two mappings, and shows how this layering delivers logical and physical data independence.
Section 01
The Story That Explains Data Models
📖 Real World Analogy
Three Ways to Draw the Same Family
Imagine you must record the people in a large joint family so anyone can look them up.
Cousin A draws a family tree — one ancestor at the top, branches fanning
downward, every person having exactly one parent above them. Clean, but it cannot show that two
cousins married into the family from two different sides.
Cousin B says "people connect in many directions" and draws a web of arrows
— anyone can link to anyone. It captures every relationship, but after fifty people the diagram
looks like a plate of noodles, and finding one person means tracing arrows by hand.
Cousin C shrugs and makes a simple table: one row per person, columns for
name, parent, and spouse. Relationships are just matching values between rows. Boring to look at —
but trivial to search, change, and reason about.
Those three cousins just reinvented the hierarchical, network, and
relational data models. This tutorial is the story of why Cousin C won.
A data model is the lens through which we describe data, its relationships, its
meaning, and the rules it must obey. Choosing a model shapes everything — how you store data, how you
query it, and how painful change will be later. Let's build the whole picture from the ground up.
💡
The Core Idea
A data model is not the data itself — it is the blueprint for how data is structured and
related. The same real-world facts can be modelled as a tree, a graph, or a set of tables, and the
choice has consequences for decades.
Section 02
What Exactly Is a Data Model?
Formally, a data model is a collection of concepts used to describe the
structure of a database — the data types, relationships, and constraints — plus a set of
operations for retrieving and updating that data. Data models are grouped by how close they
sit to the machine.
🧠
High-Level (Conceptual)
close to how users think
Describe data using entities, attributes, and relationships in business terms.
The ER model is the classic example. Used during design, before worrying about storage.
📄
Representational (Implementation)
close to how the DBMS works
Still understandable by users, but maps onto real DBMS structures. The
relational, hierarchical, and network models all live here — they are
record-based representational models.
💾
Low-Level (Physical)
close to the disk
Describe how data is physically stored — record formats, orderings, access
paths, and indexes. Meant for system specialists, hidden from ordinary users.
📚 Two Words You Must Never Confuse
Schema
The description of the database — its structure and rules. Defined once, changes rarely. Think of it as the blueprint of a building.
Instance
The actual data in the database at a particular moment — also called the database state. Changes every time you insert, update, or delete. Think of the people currently inside the building.
Relationship
One schema, infinitely many possible instances over time. The blueprint stays fixed while occupants come and go.
Section 03
The Three Record-Based Models — A Map
Three representational models defined the history of databases. They appeared in roughly this order,
each fixing a weakness of the one before.
01
Hierarchical Model (1960s)
Data as a tree of parent–child links. Pioneered by IBM's IMS for the Apollo program. Simple and fast for one-to-many, but rigid.
02
Network Model (late 1960s–70s)
Data as a graph of owner–member sets (CODASYL / DBTG standard). Handles many-to-many, but the pointer maze is hard to navigate and maintain.
03
Relational Model (1970 → today)
Data as simple tables linked by matching values, proposed by E. F. Codd. No pointers, declarative SQL, strong theory. It won — and still dominates.
Section 04
The Hierarchical Model
In the hierarchical model, data is organized as a tree. Each record (except the root)
has exactly one parent, and a parent may have many children — a strict
one-to-many (1:N) relationship. To find any record you must travel down from the root.
A department has many courses (1:N), but a course belongs to exactly one department. To reach "DBMS" the DBMS engine walks University → CSE → DBMS.
✅
Strengths
simple & fast
Conceptually simple, fast for 1:N navigation, enforces parent-child integrity automatically.
Great when data is naturally tree-shaped (org charts, file systems, XML).
❌
Weaknesses
rigid structure
Cannot represent many-to-many naturally. A child cannot have two parents without duplicating data.
Structural changes force application rewrites; redundancy creeps in.
🏢
Real Example
IBM IMS
IBM's Information Management System (IMS), built for NASA's Apollo program, is the classic hierarchical DBMS — still running in some banks today.
⚠️
The Many-to-Many Problem
What if a course is taught across two departments, or a student belongs to two clubs?
A pure tree cannot express this. The hierarchical model forces you to duplicate
the record under each parent — reintroducing exactly the redundancy databases were meant to kill.
Section 05
The Network Model
The network model fixes the tree's biggest flaw. Instead of a tree, data is a graph:
a record can have many parents (owners) and many children (members), so
many-to-many (M:N) relationships are expressed directly. Relationships are called
sets, standardized by the CODASYL DBTG committee.
🕸️ Network Model — A Graph of Owner–Member Sets
■ Owner records (Courses)■ Member records (Students)● Set links (M:N)
Lines cross freely: one student links to several courses and one course links to several students. This direct M:N support is the network model's whole reason for existing.
✅
Strengths
flexible relationships
Models M:N directly, navigates fast via pointers, less redundancy than hierarchical.
Powerful for richly interconnected data.
❌
Weaknesses
pointer maze
Very complex to design and maintain. Programs must navigate physical pointers record by record.
Almost no data independence — structure changes break programs.
🏢
Real Example
IDMS / CODASYL
The CODASYL DBTG standard and products like IDMS implemented the network model on mainframes.
Section 06
The Relational Model
In 1970, E. F. Codd proposed something radical: forget pointers and trees — store
everything as relations (tables). A relation is a set of tuples (rows),
each with the same attributes (columns). Relationships are not pointers but
shared values — a value in one table that matches a key in another.
📖 Story
The Idea That Changed Everything
Codd's insight was almost insultingly simple: people already understand tables. A spreadsheet of
students, a spreadsheet of courses, and a third linking the two by matching IDs can express any tree
or graph — with rock-solid mathematics (set theory and relational algebra) underneath. No
navigation, no pointers: you just describe what you want and the DBMS figures out how to get it.
That declarative leap gave us SQL, and the relational model never looked back.
The Same Data, As Relations
📝 STUDENT relation
roll
name
dept
1
Riya
CSE
2
Aman
ECE
3
Neha
CSE
📝 ENROLL relation (links by roll)
roll
course
grade
1
DBMS
A
1
OS
B
3
DBMS
A
The M:N relationship that needed a pointer maze in the network model is here expressed by a plain
matching value: ENROLL.roll matches STUDENT.roll. No pointers,
no duplication of the student's name.
🔑 Core Relational Vocabulary
Relation
A table, e.g. STUDENT.
Tuple
A row of the table — one student.
Attribute
A column, e.g. name.
Domain
The set of allowed values for an attribute (e.g. dept ∈ {CSE, ECE, ...}).
Degree
Number of attributes (columns) in the relation.
Cardinality
Number of tuples (rows) currently in the relation.
Primary Key
Attribute(s) that uniquely identify each tuple, e.g. roll.
Foreign Key
An attribute referencing a key in another relation, e.g. ENROLL.roll.
🏆
Why It Won
The relational model offers simplicity (just tables), a rigorous
mathematical foundation, declarative querying via SQL, and the
strongest data independence of the three. Hierarchical and network are now mostly
history; relational (and its NoSQL descendants) runs the modern world.
Section 07
Hierarchical vs Network vs Relational
Property
Hierarchical
Network
Relational
Structure
Tree
Graph
Tables (relations)
Relationship support
1:N only
1:N and M:N
1:1, 1:N, M:N via keys
Links implemented by
Physical pointers
Physical pointers
Matching values
Access method
Navigate from root
Navigate pointers
Declarative (SQL)
Data independence
Very low
Low
High
Redundancy
High (duplication)
Moderate
Controlled
Ease of use
Moderate
Hard (complex)
Easy
Theoretical basis
None formal
None formal
Set theory / relational algebra
Example systems
IBM IMS
IDMS, CODASYL
MySQL, PostgreSQL, Oracle
Era
1960s
Late 1960s–70s
1970 → today
🔎
One Line To Remember
Hierarchical = tree of pointers, Network = graph of pointers,
Relational = tables linked by values. Pointers tie data to storage; values set it free.
Section 08
The Three-Schema Architecture
Knowing the relational model is great — but how does a DBMS keep users insulated from storage details?
The answer is the ANSI/SPARC Three-Schema Architecture (1975), which separates a
database description into three levels of abstraction. This separation is the engine
that makes data independence possible.
Three users see three different external views. All map to one conceptual schema, which in turn maps to one internal storage scheme. Users never touch the green layer directly.
👁️
External Level
views / subschemas
Many external schemas (views), one per user group. Each describes only the part of the
database that user needs and hides the rest. The clerk never sees salary columns.
🌐
Conceptual Level
the whole community view
One conceptual schema describing the entire database for the whole organization —
all entities, relationships, and constraints — but hiding physical storage details.
💾
Internal Level
physical storage
One internal schema describing how data is physically stored: file
structures, record placement, indexes, and access paths. Closest to the hardware.
Section 09
The Two Mappings
The three levels are glued together by two mappings. These mappings are what let the
DBMS translate a user's request, expressed against a view, all the way down to bytes on disk — and the
answer back up.
🔗 How a Request Travels Through the Levels
Mapping 1
External / Conceptual: connects each user view to the conceptual schema. Lets the DBMS rewrite a query on a view into a query on the full database.
Mapping 2
Conceptual / Internal: connects the conceptual schema to the internal (storage) schema. Lets the DBMS find where on disk the requested data actually lives.
Result
A query written against a simple view is silently translated down two levels, executed on the stored bytes, and the result is mapped back up — all invisible to the user.
Section 10
Data Independence — The Whole Point
Data independence is the capacity to change the schema at one level
without having to change the schema at the next higher level. The three-schema architecture
exists precisely to deliver it. There are two kinds.
🛡️ Logical vs Physical Data Independence
■ External■ Conceptual■ Internal🛡️ change absorbed here
The shields sit on the mappings. Adjust the mapping and a lower-level change stops there, never rippling upward to break user programs.
🔌
Logical Data Independence
conceptual changes → views safe
The ability to change the conceptual schema (add a table or column, split a relation)
without changing external views or application programs. Harder to achieve in practice.
e.g. add an email column — old views still work.
💾
Physical Data Independence
storage changes → conceptual safe
The ability to change the internal schema (add an index, reorganize files, move to a
new disk) without changing the conceptual schema. Easier and very common.
e.g. add an index — queries just run faster, nothing else changes.
🎯
Why It Matters
change without breakage
Data independence is the cure for the file system's fatal flaw. Programs survive storage and
structure changes, so the database can evolve for decades without endless rewrites.
📐
Easy Mnemonic
Logical independence protects against Logical (conceptual) changes.
Physical independence protects against Physical (storage) changes.
The protected level above never has to care.
Section 11
Schema vs Instance — A Concrete Look
📄 SCHEMA (the blueprint, rarely changes)
STUDENT(roll, name, dept)
ENROLL(roll, course, grade)
constraint: ENROLL.roll → STUDENT.roll
constraint: dept ∈ {CSE, ECE, ME}
📊 INSTANCE (the data right now, changes often)
(1, Riya, CSE), (2, Aman, ECE), (3, Neha, CSE)
(1, DBMS, A), (1, OS, B), (3, DBMS, A)
… tomorrow a new student is added …
… the schema above stays exactly the same.
Section 12
Hands-On — Relational Model & Data Independence in Python
Let's make the abstract concrete. Using sqlite3, we build the relational tables, create a
view (an external schema), then demonstrate logical and
physical data independence — proving the view survives change.
Building the Relational Schema and an External View
import sqlite3
conn = sqlite3.connect(':memory:') # temporary in-memory database
cur = conn.cursor()
# --- CONCEPTUAL SCHEMA: the whole database, as relations ---
cur.executescript(""" CREATE TABLE student ( roll INTEGER PRIMARY KEY, name TEXT NOT NULL, dept TEXT NOT NULL ); CREATE TABLE enroll ( roll INTEGER, course TEXT, grade TEXT, FOREIGN KEY (roll) REFERENCES student(roll) );""")
cur.executemany("INSERT INTO student VALUES (?,?,?)",
[(1, 'Riya', 'CSE'), (2, 'Aman', 'ECE'), (3, 'Neha', 'CSE')])
cur.executemany("INSERT INTO enroll VALUES (?,?,?)",
[(1, 'DBMS', 'A'), (1, 'OS', 'B'), (3, 'DBMS', 'A')])
# --- EXTERNAL SCHEMA: a view for the CSE faculty (only what they need) ---
cur.execute(""" CREATE VIEW cse_grades AS SELECT s.name, e.course, e.grade FROM student s JOIN enroll e ON s.roll = e.roll WHERE s.dept = 'CSE'""")
print("CSE faculty view:")
for row in cur.execute("SELECT * FROM cse_grades"):
print(" ", row)
Now change the conceptual schema — add a brand-new column. The faculty's external view
does not mention the new column, so it keeps working unchanged.
# Change the CONCEPTUAL schema: add a column the view never referenced
cur.execute("ALTER TABLE student ADD COLUMN email TEXT")
# The external view is untouched — logical data independence in actionprint("Same view, after adding student.email:")
for row in cur.execute("SELECT * FROM cse_grades"):
print(" ", row)
OUTPUT
Same view, after adding student.email:
('Riya', 'DBMS', 'A')
('Riya', 'OS', 'B')
('Neha', 'DBMS', 'A')
Proving Physical Data Independence
# Change the INTERNAL schema: add an index (a pure storage decision)
cur.execute("CREATE INDEX idx_enroll_roll ON enroll(roll)")
# Conceptual schema and query results are IDENTICAL — only speed changedprint("Query result is unchanged; only the access path improved.")
conn.close()
OUTPUT
Query result is unchanged; only the access path improved.
🎯
What You Just Witnessed
Adding a column (a conceptual change) did not break the view — that is
logical data independence. Adding an index (an internal change)
did not alter results — that is physical data independence. The three-schema
architecture made both possible.
Section 13
Bringing It Together
📖 The Story, Resolved
Why Cousin C Sleeps Soundly
Cousin A's tree (hierarchical) and Cousin B's web (network) both hard-wired relationships into
pointers — change the structure and every program shatters. Cousin C's tables (relational)
linked data by values, and the three-schema architecture wrapped those tables in three layers
of abstraction.
Now the accountant, the clerk, and the faculty each get their own external view. The
organization keeps one conceptual schema. The DBA tunes the internal storage
freely. A column can be added, an index built, a disk swapped — and nobody's program breaks. That quiet
resilience is the modern database, and it is built on exactly the ideas in this tutorial.
Section 14
Golden Rules
📚 Data Models & Architecture — Key Takeaways
1
A data model describes structure, relationships, constraints, and operations.
Models range from high-level (ER) to representational (relational, network,
hierarchical) to low-level (physical).
2
Hierarchical = tree, 1:N only, navigate from the root. Simple but rigid and cannot
do M:N without duplication (IBM IMS).
3
Network = graph of owner–member sets, supports M:N, but is a complex pointer
maze with poor data independence (CODASYL / IDMS).
4
Relational = tables linked by matching values, with a rigorous mathematical
basis and declarative SQL. It offers the strongest data independence and dominates today (Codd, 1970).
5
The three-schema architecture separates external (user views),
conceptual (whole DB), and internal (storage) levels, glued by two
mappings.
6
Logical data independence: change the conceptual schema without breaking external
views. Physical data independence: change storage without touching the conceptual
schema.
7
Schema is the blueprint (changes rarely); an instance is the data at
a moment (changes constantly). One schema, infinitely many instances over time.