Software Architectures
📂 Software Design in the Real World
· 1 of 5
42 min read
Amazon Web Services (AWS) — The Complete Guide for Developers and Architects
A comprehensive, visual guide to Amazon Web Services — covering how AWS works, its 200+ services across compute, storage, databases, networking, security, and AI/ML, with real-world stories from Netflix, Airbnb, and NASA, animated architecture diagrams, comparison tables, and the seven golden rules every AWS architect must follow.
Section 01
The Origin Story — From Bookstore to the World's Backbone
📖 Real World Story
The Christmas That Changed Everything
In 1999, Amazon.com nearly collapsed under its own success. The holiday shopping season brought a flood of orders that their own servers couldn't handle. Engineers worked through Christmas Eve patching servers in a panic. Jeff Bezos watched the chaos and asked a dangerous question:
"What if every company in the world has this exact problem, and nobody has solved it properly?"
Seven years later, in March 2006, Amazon launched S3 (Simple Storage Service) and EC2 (Elastic Compute Cloud). They weren't selling books anymore. They were selling the cloud itself — and the world would never be the same. Today, Amazon Web Services (AWS) powers approximately one-third of the entire internet, from Netflix streaming your next binge to NASA storing satellite photos of Earth.
Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centres globally. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — are using AWS to lower costs, become more agile, and innovate faster.
☁️
What Exactly is Cloud Computing?
Cloud computing is the on-demand delivery of IT resources over the internet — computing power, storage, databases, networking, analytics, machine learning, and more. Instead of buying and owning servers, you rent exactly what you need, when you need it, and pay only for what you use. AWS is the single largest provider of this service globally.
Section 02
AWS By The Numbers — Why It Matters
🌍
Global Reach
Infrastructure Scale
33 geographic Regions, 105 Availability Zones, and over 600 CloudFront edge locations across 190+ countries. Data doesn't travel far — it doesn't have to.
💰
Market Dominance
Revenue
AWS generated $91 billion in revenue in 2023, representing a 31% market share in cloud infrastructure — more than Microsoft Azure and Google Cloud combined.
🏢
Customer Breadth
Customers
Used by Netflix, Airbnb, NASA, Samsung, BMW, the CIA, and millions of startups. If you used the internet today, you almost certainly touched AWS infrastructure.
☁ Cloud Market Share 2024
AWS
Amazon
31%
Azure
Microsoft
23%
Google
Google
11%
Others
35%
Source: Synergy Research Group, Q4 2023
Section 03
The Three Service Models — IaaS, PaaS, SaaS
📖 Analogy
Renting vs. Buying a Kitchen
Imagine you want to run a restaurant.
IaaS (Infrastructure as a Service) is like renting an empty building with plumbing and electricity. You buy your own stoves, hire cooks, design the menu — total control, but total responsibility.
PaaS (Platform as a Service) is like renting a fully equipped professional kitchen. The stoves and fridges are there. You just bring your ingredients and recipes. You worry about cooking, not plumbing.
SaaS (Software as a Service) is like ordering from a restaurant. Someone else owns the kitchen, cooks the food, and serves it. You just eat — zero maintenance, zero control over the recipe.
AWS offers all three — and lets you mix and match depending on what you're building.
Model
You Manage
AWS Manages
AWS Examples
Best For
IaaS
OS, runtime, middleware, apps, data
Virtualisation, servers, storage, networking
EC2, VPC, EBS, S3
Full control, custom environments
PaaS
Apps and data only
Runtime, OS, middleware, infra
Elastic Beanstalk, RDS, Lambda
Developers who don't want to manage servers
SaaS
Nothing — just usage
Everything
Amazon WorkMail, Chime, Connect
Business users, non-technical teams
Section 04
The AWS Universe — Core Service Categories
AWS organises its 200+ services into broad categories. Below are the most important ones every developer and architect must know.
🏗 AWS Core Service Map
Section 05
Compute Services — The Brain of AWS
Compute is where your code runs. AWS offers multiple compute paradigms — from bare-metal virtual machines to fully serverless functions — so you can choose the right tool for each workload.
🖥 EC2 — Elastic Compute Cloud: The Foundation
What
Virtual machines in the cloud. You choose the CPU, RAM, OS, and network configuration.
Why
Full control over environment. Ideal for legacy apps, custom configurations, and workloads that need persistent compute.
Pricing
On-Demand (pay per second), Reserved (1–3 year commitment, up to 72% savings), Spot (unused capacity, up to 90% savings).
Story
Netflix runs 100,000+ EC2 instances to stream to 260 million subscribers. Their whole encoding pipeline, recommendation engine, and API servers all live on EC2.
⚡ AWS Lambda — Serverless Functions
What
Run code without managing servers. Upload a function, define a trigger, and AWS handles everything — scaling, patching, availability.
Triggers
HTTP requests (via API Gateway), S3 uploads, DynamoDB streams, SQS messages, CloudWatch events, and 200+ more event sources.
Pricing
Pay only when your code runs — $0.20 per 1 million requests and $0.0000166667 per GB-second. The first 1 million requests per month are free.
Limit
15-minute execution timeout, 10 GB memory, 512 MB–10 GB ephemeral storage. Not suitable for long-running processes.
⚡ Lambda Request Flow — Animated
🏆
EC2 vs Lambda — When to Choose Which
Use Lambda for: short tasks triggered by events, microservices, APIs, image processing, cron jobs, automation. Use EC2 for: long-running processes, apps that need specific OS configuration, WebSocket servers, anything that runs longer than 15 minutes or needs persistent local storage.
Section 06
Storage Services — Where Data Lives
🪣
Amazon S3
Simple Storage Service
The most used service on AWS. Object storage for any amount of data. Store files, images, videos, backups, static websites. 11 nines (99.999999999%) durability. Pricing from $0.023/GB/month.
💽
Amazon EBS
Elastic Block Store
Block storage — like a hard drive attached to an EC2 instance. For databases, OS volumes, and apps needing low-latency disk access. Persists independently from EC2.
🧊
S3 Glacier
Cold Archive Storage
Long-term archival storage at extremely low cost — as cheap as $0.001/GB/month. Retrieval takes minutes to hours. Used for compliance archives, old backups, historical data.
📖 Story
How Airbnb Stores 100 Million Photos
Airbnb hosts millions of property photos. Every time a guest searches for a listing, dozens of images must load instantly from anywhere in the world. Airbnb stores all photos in Amazon S3 and serves them through CloudFront — AWS's global CDN. When a user in Mumbai opens a listing, the image is served from the nearest edge location in India, not from a server in California. The result: sub-100ms image load times. The cost: a fraction of building their own global CDN.
Section 07
Database Services — From SQL to NoSQL to In-Memory
Service
Type
Best For
Scale
Notable Users
Amazon RDS
Relational (SQL)
OLTP apps, traditional SQL workloads
Vertical + read replicas
Airbnb, Lyft
Aurora
Relational (MySQL/PostgreSQL)
Enterprise OLTP needing high throughput
5× faster than MySQL
Samsung, Dow Jones
DynamoDB
NoSQL (Key-Value + Document)
Millisecond latency at any scale
Infinite horizontal
Amazon.com, Lyft, Snapchat
ElastiCache
In-Memory (Redis/Memcached)
Session storage, caching, leaderboards
Sub-millisecond
Twitter, McDonald's
Redshift
Data Warehouse (Columnar)
Analytics, BI queries on petabytes
Petabyte scale
Nasdaq, Lyft
Neptune
Graph
Social networks, fraud detection, knowledge graphs
Managed graph clusters
Amazon Fraud Detector
🗄 Amazon RDS — Relational
Feature
Detail
Data Model
Tables, rows, SQL
Schema
Fixed — defined upfront
Scaling
Vertical (bigger instance)
Latency
1–10 ms (single-region)
Best When
Complex JOINs needed
⚡ DynamoDB — NoSQL
Feature
Detail
Data Model
Key-value + document
Schema
Flexible — schema-less
Scaling
Horizontal (auto, infinite)
Latency
<1 ms at any load
Best When
Massive scale, simple access
⚠️
The Database Choice Mistake
The most common AWS mistake is using RDS for everything — including workloads that don't need relational structure. If your app primarily retrieves items by a single ID, has millions of users, or needs sub-millisecond latency, DynamoDB will be cheaper, faster, and zero-maintenance. RDS requires you to manage instance size, read replicas, and failover. DynamoDB scales automatically and charges only for what you use.
Section 08
Networking — How Data Moves
1
VPC — Virtual Private Cloud
Your private, isolated section of the AWS cloud. Define your own IP address range, create subnets, configure route tables and gateways. Your EC2 instances, RDS databases, and Lambda functions all live inside a VPC. Public subnets are internet-facing; private subnets are not — critical for security.
2
Route 53 — DNS Service
AWS's highly available DNS (Domain Name System) service. Register domain names, route traffic to AWS services, and implement health checks with automatic failover. Supports routing policies like geolocation, latency-based, and weighted routing. Processes billions of DNS queries daily with 100% SLA availability.
3
CloudFront — CDN (Content Delivery Network)
A global CDN with 600+ edge locations worldwide. Caches your content (images, videos, HTML, APIs) at edge locations close to your users. Reduces latency from hundreds of milliseconds to single-digit milliseconds. Integrated with AWS Shield for DDoS protection. Used by Amazon Prime Video, Twitch, and thousands of businesses.
4
Application Load Balancer (ALB)
Automatically distributes incoming traffic across multiple targets (EC2 instances, containers, Lambda functions). Routes based on URL path, host headers, or HTTP methods. Essential for high-availability architectures. Scales up and down automatically — handles one request or ten million without any configuration change.
5
Direct Connect
A dedicated physical network connection from your on-premises data centre to AWS — bypassing the public internet entirely. Provides consistent network performance, lower latency, and higher security. Used by banks, hospitals, and any organisation that can't risk public internet variability for critical workloads.
Section 09
Security — The Shared Responsibility Model
📖 Key Mental Model
Who Guards What
AWS operates under a Shared Responsibility Model. Think of it like an apartment building:
AWS (the building owner) is responsible for the security of the cloud — the physical data centres, hardware, networking infrastructure, and the managed services themselves. They lock the building's front door, maintain the walls, and hire security guards.
You (the tenant) are responsible for security in the cloud — what you install, who you give keys to, what you leave unlocked. If you store sensitive data in a public S3 bucket, that's on you — not AWS. AWS gave you a door with a lock; you chose not to use it.
AWS Is Responsible For
✅ Physical data centre security
✅ Hypervisor & hardware patching
✅ Network infrastructure
✅ Managed service availability
✅ Global infrastructure security
You Are Responsible For
🔐 IAM users, roles, and permissions
🔐 Data encryption at rest & in transit
🔐 Security group & firewall rules
🔐 OS patching on EC2 instances
🔐 Application-level security & code
IAM — Identity and Access Management
IAM is the gatekeeper of your entire AWS account. It controls who (users, services, applications) can do what (create, read, update, delete) with which AWS resources.
👤
IAM Users
Individual accounts for people. Each has a username, password, and optional access keys. Best practice: do not use root account for daily operations.
🏷
IAM Roles
Temporary credentials assigned to AWS services. An EC2 instance assumes a role to read from S3 — no passwords stored, no keys hardcoded. The correct, secure approach.
📋
IAM Policies
JSON documents defining what actions are allowed or denied on which resources. Applied to users, groups, or roles. Follow least privilege — grant only what is necessary.
🚨
The #1 AWS Security Mistake
Hardcoding AWS access keys (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) directly in application code or committing them to GitHub. Bots scan GitHub for leaked credentials 24/7. A leaked key can result in a $50,000+ AWS bill within hours as attackers spin up mining instances. Always use IAM Roles for services and environment variables for local development.
Section 10
Real-World Use Cases — Who Uses What and Why
Company
Industry
Key AWS Services
Scale
Outcome
Netflix
Streaming
EC2, S3, DynamoDB, Lambda, Kinesis
260M subscribers
Runs 100% on AWS
Airbnb
Travel
EC2, RDS, S3, CloudFront, Redshift
150M+ users
100M photos served globally
NASA JPL
Space Science
S3, EC2, HPC, Redshift
Petabytes of data
Mars rover mission data stored
Pfizer
Pharma
SageMaker, S3, Redshift, HealthLake
Clinical trials
COVID vaccine research accelerated
Dow Jones
Media
Aurora, Kinesis, Lambda, CloudFront
Millions of articles
Real-time market news delivery
BMW
Automotive
IoT Core, SageMaker, Greengrass
5M+ connected vehicles
Connected car telemetry platform
Section 11
Serverless Architecture — A Complete Example
Serverless doesn't mean there are no servers — it means you don't manage them. AWS handles provisioning, scaling, patching, and availability. You only write code and pay per invocation.
🏗 Production Serverless Web App Architecture
✅
Why This Architecture Scales to Billions
Every component in this architecture scales independently and automatically. Lambda handles 0 to 100,000 concurrent requests with no configuration changes. DynamoDB handles any amount of read/write with millisecond latency. CloudFront caches content at 600+ edge locations. There are no servers to patch, no capacity to pre-provision, and the cost at zero traffic is essentially zero dollars.
Section 12
AI and Machine Learning — AWS Makes ML Accessible
AWS offers three tiers of AI/ML services — from raw infrastructure for researchers to pre-built APIs for developers who just want to add intelligence to their apps without any ML expertise.
🧠 AWS AI/ML Stack — Three Layers
1
AI Services (Pre-built APIs) — No ML knowledge needed. Call an API, get intelligence back. Rekognition (image/video analysis), Comprehend (NLP, sentiment), Transcribe (speech-to-text), Translate (100+ languages), Polly (text-to-speech), Fraud Detector, Personalize (recommendations).
2
Amazon SageMaker (ML Platform) — Build, train, and deploy custom ML models. A fully managed platform covering the entire ML lifecycle: data preparation, feature engineering, training, tuning, deployment, and monitoring. Used by Intuit, Thomson Reuters, and GE Healthcare.
3
Amazon Bedrock (Generative AI) — Access foundation models (Claude, Llama 2, Titan, Stable Diffusion) via a single API. Build generative AI applications without managing ML infrastructure. Fine-tune models on your data, implement RAG (Retrieval-Augmented Generation), and deploy AI agents. The fastest-growing AWS service in history.
Section 13
Pricing — How AWS Charges and How to Optimise
⏱
Pay As You Go
No upfront costs. Pay only for what you use, when you use it. EC2 billed per second (minimum 60 seconds). S3 billed per GB stored + per request. Lambda billed per millisecond of execution.
📅
Reserved Instances
Commit to 1 or 3 years and save up to 72% versus on-demand pricing. Best for stable, predictable workloads. Can be sold on the AWS Marketplace if your needs change.
💎
Spot Instances
Bid on unused EC2 capacity and save up to 90%. AWS can reclaim with 2-minute warning. Perfect for batch jobs, CI/CD, big data processing, and fault-tolerant workloads.
Service
Free Tier
Typical Small App/Month
Enterprise Scale/Month
EC2 t2.micro
750 hrs/month (12 months)
~$8–$15
$500–$50,000+
S3
5 GB storage, 20,000 GETs
~$1–$5
$100–$10,000+
Lambda
1M requests/month (forever)
~$0–$3
$50–$5,000+
DynamoDB
25 GB, 200M requests/month
~$0–$5
$100–$20,000+
RDS (db.t3.micro)
750 hrs/month (12 months)
~$15–$30
$200–$5,000+
💡
How to Control Your AWS Bill
Set up AWS Budgets with email/SMS alerts before you start spending. Use Cost Explorer to visualise spending by service, region, and tag. Enable Trusted Advisor for cost optimisation recommendations. Tag all resources with project/team names so you know exactly what's costing what. Use S3 lifecycle policies to automatically move old data to Glacier. The biggest surprise bills come from data transfer costs and accidentally leaving large EC2 instances running.
Section 14
The AWS Well-Architected Framework — Building Production Systems
AWS distilled the knowledge from thousands of production architectures into six pillars that every cloud architect must master. This is the official guide to building systems that are secure, resilient, high-performing, and cost-efficient.
🔐
Pillar 1 — Security
Protect data, systems, and assets. Implement least-privilege IAM. Enable MFA. Encrypt at rest and in transit. Use CloudTrail to log all API calls. Detect threats with GuardDuty.
⚡
Pillar 2 — Performance
Use the right resource types and sizes. Leverage CDN. Cache aggressively with ElastiCache. Use serverless for variable workloads. Monitor with CloudWatch. Go multi-region.
🛡
Pillar 3 — Reliability
Design for failure. Deploy across multiple Availability Zones. Use Auto Scaling. Implement health checks. Plan and test disaster recovery. Use Circuit Breaker patterns.
💰
Pillar 4 — Cost Optimisation
Adopt consumption-based pricing. Right-size instances. Use Spot for batch. S3 Lifecycle policies. Delete unused resources. Measure ROI of every service.
🔧
Pillar 5 — Operational Excellence
Automate everything. Infrastructure as Code (CloudFormation / Terraform). CI/CD pipelines with CodePipeline. Runbooks for every failure mode. Post-mortems after incidents.
🌱
Pillar 6 — Sustainability
Minimise environmental impact. Use energy-efficient instance types (Graviton). Right-size to reduce waste. AWS aims for 100% renewable energy by 2025.
Section 15
AWS Benefits — Why Companies Choose the Cloud
Benefit
Old Way (On-Premises)
AWS Way
Business Impact
Speed
Months to provision servers
Minutes to launch globally
Ideas go to production faster
Cost Model
Huge upfront CapEx
Pay-per-use OpEx
Preserve capital for the business
Scale
Over-provision for peaks
Auto-scale in seconds
Handle Black Friday without pre-buying
Reliability
Single data centre = SPOF
Multi-AZ, multi-Region
99.99%+ availability SLAs
Security
In-house security team costs
Enterprise security included
Compliance ready (PCI, HIPAA, SOC2)
Innovation
Build infrastructure first
200+ managed services ready
Focus on product, not plumbing
Global Reach
Open offices in each country
Deploy to any region in minutes
Go global on day one
Section 16
Golden Rules — The AWS Architect's Non-Negotiables
☁ AWS Architecture — Non-Negotiable Rules
1
Never use the root account for daily operations. Create an IAM user or role with the minimum required permissions. The root account is for billing and account-level changes only. Protect it with MFA, a hardware key if possible, and store credentials in a password manager.
2
Design for failure from day one. Assume every component will fail. Use multiple Availability Zones. Add health checks. Build circuit breakers. AWS has 99.99% SLAs — but only if your architecture takes advantage of multi-AZ redundancy. A single EC2 instance has no such guarantee.
3
Never hardcode credentials. Use IAM Roles for services, Secrets Manager for database passwords and API keys, and environment variables for local development. A secret committed to Git — even a private repo — is a potential breach. Rotate credentials regularly.
4
Tag everything, always. Every EC2 instance, S3 bucket, RDS database, and Lambda function should have at minimum: Environment, Project, and Owner tags. Without tags, cost attribution is impossible and compliance audits become nightmares.
5
Automate infrastructure with code. Use CloudFormation or Terraform to define all AWS resources. Never click through the console to create production infrastructure. If it's not in code, it doesn't exist in a repeatable way, can't be reviewed, and can't be destroyed cleanly.
6
Set up billing alerts before spending a single dollar. Open AWS Budgets, set a monthly budget appropriate to your project (even $10), and configure email + SNS alerts at 50%, 80%, and 100%. Unexpected bills are always avoidable with five minutes of configuration upfront.
7
Choose managed services over self-managed whenever possible. Running your own Kafka cluster on EC2 is weeks of operational work — vs. using Amazon MSK (Managed Kafka) in minutes. Every managed service is AWS's team patching, scaling, and monitoring it 24/7. Your time is better spent building your product, not managing infrastructure.
🚀
The AWS Learning Path
Start with the AWS Free Tier — 12 months of free EC2, S3, RDS, and Lambda. Build a project: deploy a static website on S3 + CloudFront, then add a serverless API with Lambda + API Gateway, then connect DynamoDB. Once comfortable, study for the AWS Certified Cloud Practitioner exam (the entry point), then AWS Solutions Architect Associate. These certifications are recognised worldwide and consistently rank in the top 10 highest-paying IT certifications.