DBMS 📂 Introduction to Databases · 2 of 2 36 min read

Data Models in DBMS: Hierarchical, Network, Relational, the Three-Schema Architecture & Data Independence

A detailed introduction to how databases describe data. It compares the three record-based data models — hierarchical (tree), network (graph), and relational (tables) — explaining their structures, relationship support, strengths, and weaknesses with diagrams and examples. It then covers the ANSI/SPARC three-schema architecture (external, conceptual, internal levels) and its two mappings, and shows how this layering delivers logical and physical data independence.

Section 01

The Story That Explains Data Models

Three Ways to Draw the Same Family
Imagine you must record the people in a large joint family so anyone can look them up. Cousin A draws a family tree — one ancestor at the top, branches fanning downward, every person having exactly one parent above them. Clean, but it cannot show that two cousins married into the family from two different sides.

Cousin B says "people connect in many directions" and draws a web of arrows — anyone can link to anyone. It captures every relationship, but after fifty people the diagram looks like a plate of noodles, and finding one person means tracing arrows by hand.

Cousin C shrugs and makes a simple table: one row per person, columns for name, parent, and spouse. Relationships are just matching values between rows. Boring to look at — but trivial to search, change, and reason about.

Those three cousins just reinvented the hierarchical, network, and relational data models. This tutorial is the story of why Cousin C won.

A data model is the lens through which we describe data, its relationships, its meaning, and the rules it must obey. Choosing a model shapes everything — how you store data, how you query it, and how painful change will be later. Let's build the whole picture from the ground up.

💡
The Core Idea

A data model is not the data itself — it is the blueprint for how data is structured and related. The same real-world facts can be modelled as a tree, a graph, or a set of tables, and the choice has consequences for decades.


Section 02

What Exactly Is a Data Model?

Formally, a data model is a collection of concepts used to describe the structure of a database — the data types, relationships, and constraints — plus a set of operations for retrieving and updating that data. Data models are grouped by how close they sit to the machine.

🧠
High-Level (Conceptual)
close to how users think
Describe data using entities, attributes, and relationships in business terms. The ER model is the classic example. Used during design, before worrying about storage.
📄
Representational (Implementation)
close to how the DBMS works
Still understandable by users, but maps onto real DBMS structures. The relational, hierarchical, and network models all live here — they are record-based representational models.
💾
Low-Level (Physical)
close to the disk
Describe how data is physically stored — record formats, orderings, access paths, and indexes. Meant for system specialists, hidden from ordinary users.
📚 Two Words You Must Never Confuse
Schema
The description of the database — its structure and rules. Defined once, changes rarely. Think of it as the blueprint of a building.
Instance
The actual data in the database at a particular moment — also called the database state. Changes every time you insert, update, or delete. Think of the people currently inside the building.
Relationship
One schema, infinitely many possible instances over time. The blueprint stays fixed while occupants come and go.

Section 03

The Three Record-Based Models — A Map

Three representational models defined the history of databases. They appeared in roughly this order, each fixing a weakness of the one before.

01
Hierarchical Model (1960s)
Data as a tree of parent–child links. Pioneered by IBM's IMS for the Apollo program. Simple and fast for one-to-many, but rigid.
02
Network Model (late 1960s–70s)
Data as a graph of owner–member sets (CODASYL / DBTG standard). Handles many-to-many, but the pointer maze is hard to navigate and maintain.
03
Relational Model (1970 → today)
Data as simple tables linked by matching values, proposed by E. F. Codd. No pointers, declarative SQL, strong theory. It won — and still dominates.

Section 04

The Hierarchical Model

In the hierarchical model, data is organized as a tree. Each record (except the root) has exactly one parent, and a parent may have many children — a strict one-to-many (1:N) relationship. To find any record you must travel down from the root.

🌳 Hierarchical Model — A Tree of Records
■ Root ■ Parent (1:N) ■ Child records ● Access pointer (top-down only)
University Dept: CSE Dept: ECE DBMS OS VLSI Signals Every child has exactly ONE parent — access always begins at the root

A department has many courses (1:N), but a course belongs to exactly one department. To reach "DBMS" the DBMS engine walks University → CSE → DBMS.

Strengths
simple & fast
Conceptually simple, fast for 1:N navigation, enforces parent-child integrity automatically.
Great when data is naturally tree-shaped (org charts, file systems, XML).
Weaknesses
rigid structure
Cannot represent many-to-many naturally. A child cannot have two parents without duplicating data.
Structural changes force application rewrites; redundancy creeps in.
🏢
Real Example
IBM IMS
IBM's Information Management System (IMS), built for NASA's Apollo program, is the classic hierarchical DBMS — still running in some banks today.
⚠️
The Many-to-Many Problem

What if a course is taught across two departments, or a student belongs to two clubs? A pure tree cannot express this. The hierarchical model forces you to duplicate the record under each parent — reintroducing exactly the redundancy databases were meant to kill.


Section 05

The Network Model

The network model fixes the tree's biggest flaw. Instead of a tree, data is a graph: a record can have many parents (owners) and many children (members), so many-to-many (M:N) relationships are expressed directly. Relationships are called sets, standardized by the CODASYL DBTG committee.

🕸️ Network Model — A Graph of Owner–Member Sets
■ Owner records (Courses) ■ Member records (Students) ● Set links (M:N)
C1: DBMS C2: OS S1: Riya S2: Aman S3: Neha Riya and Neha each belong to TWO courses — a member can have MANY owners (impossible in a tree)

Lines cross freely: one student links to several courses and one course links to several students. This direct M:N support is the network model's whole reason for existing.

Strengths
flexible relationships
Models M:N directly, navigates fast via pointers, less redundancy than hierarchical.
Powerful for richly interconnected data.
Weaknesses
pointer maze
Very complex to design and maintain. Programs must navigate physical pointers record by record.
Almost no data independence — structure changes break programs.
🏢
Real Example
IDMS / CODASYL
The CODASYL DBTG standard and products like IDMS implemented the network model on mainframes.

Section 06

The Relational Model

In 1970, E. F. Codd proposed something radical: forget pointers and trees — store everything as relations (tables). A relation is a set of tuples (rows), each with the same attributes (columns). Relationships are not pointers but shared values — a value in one table that matches a key in another.

The Idea That Changed Everything
Codd's insight was almost insultingly simple: people already understand tables. A spreadsheet of students, a spreadsheet of courses, and a third linking the two by matching IDs can express any tree or graph — with rock-solid mathematics (set theory and relational algebra) underneath. No navigation, no pointers: you just describe what you want and the DBMS figures out how to get it. That declarative leap gave us SQL, and the relational model never looked back.

The Same Data, As Relations

📝 STUDENT relation
rollnamedept
1RiyaCSE
2AmanECE
3NehaCSE
📝 ENROLL relation (links by roll)
rollcoursegrade
1DBMSA
1OSB
3DBMSA

The M:N relationship that needed a pointer maze in the network model is here expressed by a plain matching value: ENROLL.roll matches STUDENT.roll. No pointers, no duplication of the student's name.

🔑 Core Relational Vocabulary
Relation
A table, e.g. STUDENT.
Tuple
A row of the table — one student.
Attribute
A column, e.g. name.
Domain
The set of allowed values for an attribute (e.g. dept ∈ {CSE, ECE, ...}).
Degree
Number of attributes (columns) in the relation.
Cardinality
Number of tuples (rows) currently in the relation.
Primary Key
Attribute(s) that uniquely identify each tuple, e.g. roll.
Foreign Key
An attribute referencing a key in another relation, e.g. ENROLL.roll.
🏆
Why It Won

The relational model offers simplicity (just tables), a rigorous mathematical foundation, declarative querying via SQL, and the strongest data independence of the three. Hierarchical and network are now mostly history; relational (and its NoSQL descendants) runs the modern world.


Section 07

Hierarchical vs Network vs Relational

PropertyHierarchicalNetworkRelational
StructureTreeGraphTables (relations)
Relationship support1:N only1:N and M:N1:1, 1:N, M:N via keys
Links implemented byPhysical pointersPhysical pointersMatching values
Access methodNavigate from rootNavigate pointersDeclarative (SQL)
Data independenceVery lowLowHigh
RedundancyHigh (duplication)ModerateControlled
Ease of useModerateHard (complex)Easy
Theoretical basisNone formalNone formalSet theory / relational algebra
Example systemsIBM IMSIDMS, CODASYLMySQL, PostgreSQL, Oracle
Era1960sLate 1960s–70s1970 → today
🔎
One Line To Remember

Hierarchical = tree of pointers, Network = graph of pointers, Relational = tables linked by values. Pointers tie data to storage; values set it free.


Section 08

The Three-Schema Architecture

Knowing the relational model is great — but how does a DBMS keep users insulated from storage details? The answer is the ANSI/SPARC Three-Schema Architecture (1975), which separates a database description into three levels of abstraction. This separation is the engine that makes data independence possible.

🧱 ANSI/SPARC Three-Schema Architecture
■ External level (user views) ■ Conceptual level (whole DB) ■ Internal level (storage)
👤 Clerk 👤 Accountant 👤 Faculty External View 1 name, seat External View 2 name, fees External View 3 name, grades ↕ external / conceptual mapping CONCEPTUAL SCHEMA whole DB: entities, relationships, constraints (community view) ↕ conceptual / internal mapping INTERNAL SCHEMA physical storage: file organization, record layout, indexes, access paths stored bytes on disk EXTERNAL CONCEPTUAL INTERNAL

Three users see three different external views. All map to one conceptual schema, which in turn maps to one internal storage scheme. Users never touch the green layer directly.

👁️
External Level
views / subschemas
Many external schemas (views), one per user group. Each describes only the part of the database that user needs and hides the rest. The clerk never sees salary columns.
🌐
Conceptual Level
the whole community view
One conceptual schema describing the entire database for the whole organization — all entities, relationships, and constraints — but hiding physical storage details.
💾
Internal Level
physical storage
One internal schema describing how data is physically stored: file structures, record placement, indexes, and access paths. Closest to the hardware.

Section 09

The Two Mappings

The three levels are glued together by two mappings. These mappings are what let the DBMS translate a user's request, expressed against a view, all the way down to bytes on disk — and the answer back up.

🔗 How a Request Travels Through the Levels
Mapping 1
External / Conceptual: connects each user view to the conceptual schema. Lets the DBMS rewrite a query on a view into a query on the full database.
Mapping 2
Conceptual / Internal: connects the conceptual schema to the internal (storage) schema. Lets the DBMS find where on disk the requested data actually lives.
Result
A query written against a simple view is silently translated down two levels, executed on the stored bytes, and the result is mapped back up — all invisible to the user.

Section 10

Data Independence — The Whole Point

Data independence is the capacity to change the schema at one level without having to change the schema at the next higher level. The three-schema architecture exists precisely to deliver it. There are two kinds.

🛡️ Logical vs Physical Data Independence
■ External ■ Conceptual ■ Internal 🛡️ change absorbed here
EXTERNAL SCHEMAS (user views) CONCEPTUAL SCHEMA INTERNAL SCHEMA (storage) change here… 🛡️ LOGICAL data independence — views unaffected change here… 🛡️ PHYSICAL data independence — conceptual unaffected A change at a lower level is "absorbed" by adjusting only the mapping — the level above never notices.

The shields sit on the mappings. Adjust the mapping and a lower-level change stops there, never rippling upward to break user programs.

🔌
Logical Data Independence
conceptual changes → views safe
The ability to change the conceptual schema (add a table or column, split a relation) without changing external views or application programs. Harder to achieve in practice.
e.g. add an email column — old views still work.
💾
Physical Data Independence
storage changes → conceptual safe
The ability to change the internal schema (add an index, reorganize files, move to a new disk) without changing the conceptual schema. Easier and very common.
e.g. add an index — queries just run faster, nothing else changes.
🎯
Why It Matters
change without breakage
Data independence is the cure for the file system's fatal flaw. Programs survive storage and structure changes, so the database can evolve for decades without endless rewrites.
📐
Easy Mnemonic

Logical independence protects against Logical (conceptual) changes. Physical independence protects against Physical (storage) changes. The protected level above never has to care.


Section 11

Schema vs Instance — A Concrete Look

📄 SCHEMA (the blueprint, rarely changes)
STUDENT(roll, name, dept)
ENROLL(roll, course, grade)
constraint: ENROLL.roll → STUDENT.roll
constraint: dept ∈ {CSE, ECE, ME}
📊 INSTANCE (the data right now, changes often)
(1, Riya, CSE), (2, Aman, ECE), (3, Neha, CSE)
(1, DBMS, A), (1, OS, B), (3, DBMS, A)
… tomorrow a new student is added …
… the schema above stays exactly the same.

Section 12

Hands-On — Relational Model & Data Independence in Python

Let's make the abstract concrete. Using sqlite3, we build the relational tables, create a view (an external schema), then demonstrate logical and physical data independence — proving the view survives change.

Building the Relational Schema and an External View

import sqlite3

conn = sqlite3.connect(':memory:')   # temporary in-memory database
cur  = conn.cursor()

# --- CONCEPTUAL SCHEMA: the whole database, as relations ---
cur.executescript("""
    CREATE TABLE student (
        roll INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        dept TEXT NOT NULL
    );
    CREATE TABLE enroll (
        roll   INTEGER,
        course TEXT,
        grade  TEXT,
        FOREIGN KEY (roll) REFERENCES student(roll)
    );
""")

cur.executemany("INSERT INTO student VALUES (?,?,?)",
    [(1, 'Riya', 'CSE'), (2, 'Aman', 'ECE'), (3, 'Neha', 'CSE')])
cur.executemany("INSERT INTO enroll VALUES (?,?,?)",
    [(1, 'DBMS', 'A'), (1, 'OS', 'B'), (3, 'DBMS', 'A')])

# --- EXTERNAL SCHEMA: a view for the CSE faculty (only what they need) ---
cur.execute("""
    CREATE VIEW cse_grades AS
    SELECT s.name, e.course, e.grade
    FROM   student s JOIN enroll e ON s.roll = e.roll
    WHERE  s.dept = 'CSE'
""")

print("CSE faculty view:")
for row in cur.execute("SELECT * FROM cse_grades"):
    print("  ", row)
OUTPUT
CSE faculty view: ('Riya', 'DBMS', 'A') ('Riya', 'OS', 'B') ('Neha', 'DBMS', 'A')

Proving Logical Data Independence

Now change the conceptual schema — add a brand-new column. The faculty's external view does not mention the new column, so it keeps working unchanged.

# Change the CONCEPTUAL schema: add a column the view never referenced
cur.execute("ALTER TABLE student ADD COLUMN email TEXT")

# The external view is untouched — logical data independence in action
print("Same view, after adding student.email:")
for row in cur.execute("SELECT * FROM cse_grades"):
    print("  ", row)
OUTPUT
Same view, after adding student.email: ('Riya', 'DBMS', 'A') ('Riya', 'OS', 'B') ('Neha', 'DBMS', 'A')

Proving Physical Data Independence

# Change the INTERNAL schema: add an index (a pure storage decision)
cur.execute("CREATE INDEX idx_enroll_roll ON enroll(roll)")

# Conceptual schema and query results are IDENTICAL — only speed changed
print("Query result is unchanged; only the access path improved.")
conn.close()
OUTPUT
Query result is unchanged; only the access path improved.
🎯
What You Just Witnessed

Adding a column (a conceptual change) did not break the view — that is logical data independence. Adding an index (an internal change) did not alter results — that is physical data independence. The three-schema architecture made both possible.


Section 13

Bringing It Together

Why Cousin C Sleeps Soundly
Cousin A's tree (hierarchical) and Cousin B's web (network) both hard-wired relationships into pointers — change the structure and every program shatters. Cousin C's tables (relational) linked data by values, and the three-schema architecture wrapped those tables in three layers of abstraction.

Now the accountant, the clerk, and the faculty each get their own external view. The organization keeps one conceptual schema. The DBA tunes the internal storage freely. A column can be added, an index built, a disk swapped — and nobody's program breaks. That quiet resilience is the modern database, and it is built on exactly the ideas in this tutorial.

Section 14

Golden Rules

📚 Data Models & Architecture — Key Takeaways
1
A data model describes structure, relationships, constraints, and operations. Models range from high-level (ER) to representational (relational, network, hierarchical) to low-level (physical).
2
Hierarchical = tree, 1:N only, navigate from the root. Simple but rigid and cannot do M:N without duplication (IBM IMS).
3
Network = graph of owner–member sets, supports M:N, but is a complex pointer maze with poor data independence (CODASYL / IDMS).
4
Relational = tables linked by matching values, with a rigorous mathematical basis and declarative SQL. It offers the strongest data independence and dominates today (Codd, 1970).
5
The three-schema architecture separates external (user views), conceptual (whole DB), and internal (storage) levels, glued by two mappings.
6
Logical data independence: change the conceptual schema without breaking external views. Physical data independence: change storage without touching the conceptual schema.
7
Schema is the blueprint (changes rarely); an instance is the data at a moment (changes constantly). One schema, infinitely many instances over time.
You have completed Introduction to Databases. View all sections →