DBMS
📂 Introduction to Databases
· 1 of 2
34 min read
Introduction to Databases: DBMS vs File Systems, Characteristics, Advantages & Disadvantages
A beginner-friendly introduction to databases that contrasts the old file-based approach with a modern DBMS. It explains the core vocabulary (data, database, DBMS), the problems files cause (redundancy, inconsistency, weak concurrency and security), the four defining characteristics of the database approach, and the real advantages and disadvantages of a DBMS — illustrated with stories, comparison tables, two animated diagrams, and a runnable Python (sqlite3) example.
Section 01
The Story That Explains Why Databases Exist
📖 Real World Analogy
The Tale of Two Librarians
Imagine a small college where three departments each keep their own
student records. The Hostel office keeps a notebook of names and room numbers.
The Library keeps a separate register of names and borrowed books. The Exams office
keeps yet another file of names and marks.
One day a student, Riya, changes her surname after her father's transfer.
The Hostel updates its notebook. The Library forgets. The Exams office never hears about it.
Now the same person exists as three slightly different people across three files — and nobody
is sure which record is correct.
A second college next door does it differently. All three offices write to and read
from one shared, well-guarded record system. When Riya's name changes once,
every office instantly sees the truth. No contradictions. No duplicate typing. No confusion.
That second system is a Database Management System (DBMS) — and this entire
tutorial is about why the second college sleeps better at night.
Almost every application you use — banking, ticket booking, social media, your phone's contact
list — sits on top of a database. Before we appreciate the DBMS, we need to understand the messy
world that came before it: the file-based system.
💡
The Core Idea
A DBMS exists to let many users and programs share the same data safely —
without duplication, contradiction, or data loss — while hiding the messy details of
how that data is physically stored.
Section 02
First, the Vocabulary — Data, Database, DBMS
These three words get used interchangeably in casual talk, but in this subject they mean very
precise things. Get these right and everything else clicks into place.
📚 The Three Foundational Terms
Data
Raw, recorded facts with implicit meaning. The number 21, the name Riya, or the date 2025-08-14 are data. On their own they are just values.
Database
A structured, organized collection of related data that models some part of the real world (a "mini-world"), designed for a specific purpose and audience.
DBMS
The software layer that lets users define, create, query, update, and administer the database — e.g. MySQL, PostgreSQL, Oracle, SQLite, MongoDB.
System
Database + DBMS together form a Database System. The data is the content; the DBMS is the manager standing guard over it.
🧠
Easy Way to Remember
Data is the milk. The database is the bottle that organizes and
holds it. The DBMS is the fridge that keeps it safe, lets the right people pour it,
and stops it from spoiling.
Section 03
The Old World — File-Based Systems
Before DBMS software existed (roughly pre-1970s), each application program managed its own data
in its own permanent files, stored directly on disk. Every program defined its own
file format and wrote its own code to read and write those files.
01
Each Program Owns Its Own Files
The Payroll program has payroll files. The HR program has HR files. Nobody shares — each program reads and writes only the files it created.
02
File Structure Is Hard-Coded
The exact layout of each file (field order, widths, types) is baked into the program's source code. The program and the data are tightly glued together.
03
Same Data Gets Copied Everywhere
An employee's name and address sit in the Payroll file AND the HR file AND the Insurance file — three separate copies maintained by hand.
04
Change One File Format → Rewrite the Program
Add a single new field to a file and every program that touches it must be located, edited, recompiled, and re-tested. Maintenance becomes a nightmare.
⚠️
The Fundamental Weakness
In a file-based system, data and the programs that use it are inseparable.
There is no central manager, no shared definition of what the data means, and no protection
against two programs corrupting the same file at the same time. This single weakness causes
every problem we are about to list.
Section 04
Animated Diagram — File System vs DBMS
This picture is the heart of the whole topic. On the left, three programs each keep a private,
duplicated copy of employee data. On the right, all three share one guarded, single source of truth.
📊 How Data Is Organized: Files vs Database
■ Duplicated copies (redundant)■ Single shared store■ Application programs
The left side stores the same employee data three times. The right side stores it once, with the DBMS controlling every read and write.
Section 05
The Problems With File-Based Systems
Because each program owns its own glued-on files, a predictable list of pains emerges.
These are the classic disadvantages every textbook lists — here they are with plain explanations.
📋
Data Redundancy
same fact stored many times
The same employee name lives in Payroll, HR, and Insurance files. Storage is wasted,
and every copy must be edited separately.
⚖️
Data Inconsistency
copies disagree
Update one copy and forget the others, and now the database "knows" two different
addresses for the same person. Which one is true?
🪟
Data Isolation
scattered, hard to combine
Data is spread across many files in many formats. Writing a report that joins them is
slow, manual, and error-prone.
🔗
Integrity Problems
no central rules
Rules like "age must be positive" live inside each program. One buggy program can write
garbage that the others happily trust.
⚡
Atomicity Failures
half-done updates
A transfer debits account A, the power fails before crediting account B. The money simply
vanishes — nothing rolls it back.
👥
Concurrency Anomalies
two writers, one file
Two programs edit the same file at once and overwrite each other. The last writer wins;
the other change is lost silently.
🔐
Weak Security
all-or-nothing access
File permissions are coarse. You cannot easily say "this clerk may see salaries but not
edit them" at the row or column level.
🔧
Hard Maintenance
data depends on programs
Change a file's layout and every program reading it must be rewritten and recompiled.
This is the absence of data independence.
📁
No Standard Querying
re-code every question
Every new question ("who joined after 2020?") needs a brand-new program. There is no
universal query language like SQL.
Section 06
DBMS vs File System — The Full Comparison
Aspect
File-Based System
DBMS
Data redundancy
High — copies everywhere
Controlled via normalization
Data consistency
Easily contradictory
Enforced by constraints
Data sharing
Difficult, file-by-file
Built-in, multi-user
Querying
Custom code each time
Standard SQL queries
Data independence
None — tightly coupled
Logical & physical independence
Integrity rules
Buried in each program
Central, declarative
Concurrency control
Absent — lost updates
Locking / MVCC managed
Recovery from crash
Manual, often impossible
Transactions + logs
Security granularity
Coarse file permissions
Per-user, per-table, per-column
Setup cost & complexity
Very low
Higher — software + skills
Best for
Tiny, single-user, throwaway data
Shared, growing, mission-critical data
🔎
Read the Last Two Rows Carefully
A DBMS is not automatically "better" for everything. For a tiny, single-user, one-off
task, a plain file is simpler and cheaper. The DBMS earns its keep the moment data is
shared, important, and growing.
Section 07
Characteristics of the Database Approach
What exactly makes the "database approach" different from just dumping data into files?
Four defining characteristics, straight from the classic definition (Elmasri & Navathe).
📑
1. Self-Describing Nature
data + metadata together
The database stores not just the data, but also a complete description of its own
structure in a catalog (the metadata). The DBMS reads this catalog to
understand any database — in a file system, that structure lived only inside program code.
🔌
2. Program–Data Independence
change storage, not programs
Because structure lives in the catalog, you can change how data is stored
(add an index, split a table) without touching the application programs. This is also
called data abstraction — users see a clean conceptual view, not raw bytes.
👁️
3. Multiple Views
one database, many lenses
Different users need different slices. A clerk sees names and seats; an accountant sees
payments. The same database can present many customized views of the same
underlying data, each hiding what is irrelevant.
👥
4. Sharing & Transactions
many users, safely
Multiple users access the data concurrently. The DBMS uses concurrency
control and transactions to guarantee each user sees correct,
consistent data even when hundreds are reading and writing at once.
✅
Bonus: Enforced Integrity
rules live in the data
Constraints (primary keys, foreign keys, checks) are declared once, centrally, and the
DBMS enforces them for every program automatically — no program can sneak in bad data.
🛡️
Bonus: Built-in Protection
security + recovery
Authorization controls who may do what, and recovery subsystems restore the database to a
consistent state after a crash — both impossible to do reliably with loose files.
🔑
The One Sentence To Remember
The database approach means data is self-describing, shared, independent of
programs, and centrally protected — everything a pile of files can never be.
Section 08
Animated Diagram — Where the DBMS Sits
Notice how the DBMS acts as a gatekeeper between users and the stored data and metadata.
No program ever touches the raw disk directly — every request flows through the DBMS,
which is exactly what enables data independence, security, and concurrency control.
Every blue request must pass through the amber DBMS before reaching the green data store. That single chokepoint is what makes security, integrity, and concurrency enforceable.
Section 09
Advantages of a DBMS
✅
Controlled Redundancy
Data is stored once and referenced, not copied. Saves storage and removes the root cause of inconsistency.
normalization, foreign keys
✅
Data Consistency & Integrity
Central constraints guarantee every program sees the same correct data, obeying the same rules.
PRIMARY KEY, CHECK, FK
✅
Easy, Powerful Querying
Ask any question with standard SQL in seconds — no new program to write, compile, and debug.
SELECT … JOIN … WHERE
✅
Data Independence
Change physical storage or logical structure without rewriting applications. Maintenance becomes sane.
logical + physical independence
✅
Concurrent Multi-User Access
Thousands of users read and write at once, and the DBMS keeps every transaction correct and isolated.
locking, MVCC, ACID
✅
Security & Recovery
Fine-grained access control plus automatic backup and crash recovery protect the data from people and disasters alike.
GRANT/REVOKE, transaction logs
Section 10
Disadvantages of a DBMS
A DBMS is powerful, but that power is not free. Knowing the costs is what separates a student
from an engineer who chooses the right tool.
💰
High Cost
Enterprise DBMS licences, powerful hardware, and skilled administrators all cost money — sometimes a lot.
licences, servers, RAM
🧠
Complexity
Designing, tuning, and securing a database needs real expertise. Bad design can be worse than no database.
schema design, tuning
👷
Needs a DBA
Larger systems require a dedicated Database Administrator for backups, performance, and security.
ongoing staffing
🔢
Overhead for Small Tasks
For a tiny, single-user, throwaway job, a full DBMS is heavy machinery to crack a nut — a flat file is simpler.
setup > benefit
⚠️
Single Point of Failure
Centralizing everything means if the database goes down, every dependent application stops at once.
needs replication / HA
📚
Learning Curve
Teams must learn SQL, data modelling, and the specific DBMS's quirks before becoming productive.
training time
Section 11
So When Is a Plain File Actually Fine?
✅ A DBMS Is Worth It When…
Data is shared by many users / programs
Data is large and keeps growing
Consistency & integrity really matter
You need ad-hoc querying and reports
Concurrent access is required
Security and recovery are critical
❌ A Simple File Is Enough When…
Data is tiny and rarely changes
Only one program / user touches it
It is a one-off or throwaway task
No complex relationships exist
No concurrent access is needed
Speed of setup beats every other concern
🎯
The Engineer's Judgment
"Use a database" is not always the right answer. Match the tool to the problem —
a config file or CSV is perfectly respectable for small, private, simple data. Reach for a
DBMS the moment sharing, scale, or safety enters the picture.
Section 12
Hands-On — The Database Approach in Python
Let's see the difference. Using Python's built-in sqlite3 module, we store
customer and order data once, link them with a key (no redundancy), and ask a
question with SQL instead of writing custom file-reading code.
Creating the Database & Querying It
import sqlite3
# Connect (creates the file the first time) — this is our single source of truth
conn = sqlite3.connect('college.db')
cur = conn.cursor()
# Define structure ONCE. The DBMS stores this in its own catalog (self-describing).
cur.execute(""" CREATE TABLE IF NOT EXISTS students ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, dept TEXT NOT NULL )""")
# Orders reference the student by id — the name is NOT copied (no redundancy)
cur.execute(""" CREATE TABLE IF NOT EXISTS marks ( student_id INTEGER, subject TEXT, score INTEGER, FOREIGN KEY (student_id) REFERENCES students(id) )""")
# Insert data once
cur.executemany("INSERT OR IGNORE INTO students VALUES (?, ?, ?)", [
(1, 'Riya', 'CSE'),
(2, 'Aman', 'ECE'),
(3, 'Neha', 'CSE'),
])
cur.executemany("INSERT INTO marks VALUES (?, ?, ?)", [
(1, 'DBMS', 88),
(2, 'DBMS', 73),
(3, 'DBMS', 91),
])
conn.commit()
# Ask ANY question with SQL — no custom file parser needed
cur.execute(""" SELECT s.name, s.dept, m.score FROM students s JOIN marks m ON s.id = m.student_id WHERE m.score >= 80 ORDER BY m.score DESC""")
print("Top DBMS scorers:")
for name, dept, score in cur.fetchall():
print(f" {name:6s} ({dept}) -> {score}")
conn.close()
OUTPUT
Top DBMS scorers:
Neha (CSE) -> 91
Riya (CSE) -> 88
🎯
What Just Demonstrated the Database Approach
The student name Riya is stored exactly once. The marks table only keeps her
id — so renaming her updates everything at once (no redundancy,
no inconsistency). We asked a brand-new question purely in SQL with zero new
program code. That is the database approach in four lines.
Atomicity — All-or-Nothing Transactions
This is the protection a loose file can never give you. If anything fails mid-way, the whole
block rolls back as if it never happened.
conn = sqlite3.connect('college.db')
try:
with conn: # commits on success, rolls back on error
conn.execute("UPDATE marks SET score = score + 5 WHERE subject = 'DBMS'")
conn.execute("INSERT INTO marks VALUES (99, 'OS', 'oops')") # bad row -> errorexcept sqlite3.Erroras e:
print(f"Transaction failed and was rolled back: {e}")
# The +5 update was UNDONE too — nothing partial survivesprint("Scores are unchanged — atomicity protected the data.")
conn.close()
OUTPUT
Transaction failed and was rolled back: ...
Scores are unchanged — atomicity protected the data.
Section 13
Bringing It Together
📖 The Story, Resolved
Why the Second College Sleeps Better
Remember Riya from Section 01? In the second college, her name lives in one
students record. The Hostel, Library, and Exams offices are just different views over
that same shared data. When she changes her surname, the DBMS updates it once, enforces every
rule, lets all three offices read it at the same time without clashing, and could recover it
if the server crashed mid-update.
The first college had files. The second had a database system. That single
architectural choice is the entire reason this field exists — and everything you learn next
(data models, the relational model, SQL, normalization, transactions) is built on this foundation.
Section 14
Golden Rules
📚 Introduction to Databases — Key Takeaways
1
Data is raw facts, a database is organized related data, and a
DBMS is the software that manages it. Together the database and DBMS form a
database system.
2
The fatal flaw of file systems is that data and programs are glued together.
Every disadvantage — redundancy, inconsistency, poor sharing, hard maintenance — flows
from this one weakness.
3
The four characteristics of the database approach are: self-describing nature
(data + metadata), program–data independence, multiple views,
and controlled sharing with transactions.
4
A DBMS turns redundancy into controlled redundancy, enforces
integrity centrally, supports SQL querying, and provides
concurrency, security, and recovery — none of which files give you reliably.
5
A DBMS has real costs: money, complexity, a DBA, and a single point of failure.
Power is never free — respect the trade-off.
6
Choose the tool for the job. Reach for a DBMS when data is shared, large, or critical;
a plain file is fine for tiny, single-user, throwaway data.
7
Every read and write passes through the DBMS. That single chokepoint is precisely
what makes data independence, integrity, security, and concurrency enforceable.