Building a Secure, Scalable Cloud Infrastructure: A Complete Guide with Real-Time Project
1. AWS Services & Security
Key Services & Security Best Practices
Amazon EC2:
- Security Groups: Stateful firewalls controlling inbound/outbound traffic.
- NACLs: Stateless subnet-level filters for granular control.
- Encryption: Use AWS KMS to encrypt EBS volumes and instance storage.
Amazon S3:
- Bucket Policies: Restrict access by IP, IAM roles, or conditions.
- Block Public Access: Enable this setting to prevent accidental public exposure.
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Deny", "Principal": "*", "Action": "s3:*", "Resource": "arn:aws:s3:::example-bucket/*", "Condition": {"Bool": {"aws:SecureTransport": "false"}} }] }
- Server-Side Encryption (SSE): Use SSE-KMS for audit trails.
AWS IAM:
- Least Privilege: Assign roles with minimal permissions.
- MFA Enforcement: Require multi-factor authentication for sensitive operations.
AWS CloudTrail:
- Audit Logs: Track API calls for compliance and security analysis.
2. Regaining EC2 Access After Key Loss
Step-by-Step Recovery
- Attach Volume to Rescue Instance:
- Stop the instance → Detach root volume → Attach to a rescue instance.
- Modify
authorized_keys
to add a new public key.
- EC2 Instance Connect: Browser-based SSH if enabled.
- AWS Systems Manager (SSM): Execute commands via Session Manager.
- Terminate Instance (Last Resort): Use only if volume detachment isn’t feasible.
3. Restricting EC2 Access for a User with Private Key
Mitigation Strategies
- Revoke IAM Permissions: Remove EC2 access from the user’s policy.
- Rotate SSH Keys: Use AWS Systems Manager Parameter Store to manage keys centrally.
- Update Security Groups: Block the user’s IP address.
4. S3 Bucket Policy Example
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789012:user/ExampleUser"},
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::example-bucket/*"
},
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::example-bucket/*",
"Condition": {"Bool": {"aws:SecureTransport": "false"}}
}
]
}
5. Terraform Script with Tags & State Locking
provider "aws" {
region = "us-east-1"
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "prod-vpc"
Environment = "production"
}
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = {
Tier = "public"
}
}
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
}
resource "aws_nat_gateway" "nat" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public.id
}
# State Locking with S3 & DynamoDB
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/network.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-lock"
encrypt = true
}
}
6. VPC, Subnets, Route Tables & NAT Gateway
- VPC: Isolated network for resource deployment (e.g., CIDR:
10.0.0.0/16
). - Subnets:
- Public: Hosts web servers (route via Internet Gateway).
- Private: Hosts databases (route via NAT Gateway for outbound internet).
- NAT Gateway: Allows private subnets to access updates/patches without public exposure.
7. Recovering Corrupted Terraform State
- S3 Versioning: Restore from a previous version.
- Terraform Cloud: Managed state with version history and team collaboration.
- Manual Import: Rebuild state using
terraform import
.
8. Terraform Drift Detection
Run terraform plan
to detect discrepancies between actual infrastructure and Terraform state.
10. Dockerfile with Healthcheck & Non-Root User
# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Runtime
FROM node:18-alpine
WORKDIR /app
ENV NODE_ENV=production
USER node
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev
HEALTHCHECK --interval=30s --timeout=10s \
CMD curl -f http://localhost:3000/health || exit 1
EXPOSE 3000
CMD ["node", "dist/index.js"]
13. Kubernetes Manifest with StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
spec:
storageClassName: fast
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:14
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
storageClassName: fast
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
16. Disaster Recovery & AWS Backup
- Multi-AZ RDS: Automatic failover for databases.
- AWS Backup: Schedule EBS/RDS snapshots to S3/Glacier.
- Cross-Region Replication: Use Terraform to redeploy infrastructure in another region.
- Reference: AWS Disaster Recovery Whitepaper.
17. Service Mesh: Istio vs. Linkerd
Feature | Istio | Linkerd |
---|---|---|
Complexity | High (requires Kubernetes expertise) | Lightweight & simple |
Performance | Overhead due to Envoy proxies | Minimal latency |
Use Case | Large-scale microservices | Small to medium workloads |
19. Logging & Monitoring with Prometheus/Grafana
- Prometheus: Scrape Kubernetes pod metrics.
- Grafana: Visualize metrics with dashboards.
- AlertManager: Trigger alerts for CPU/memory thresholds.
20. CI/CD Pipeline with Zero-Downtime
GitHub Actions Workflow:
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Terraform Plan
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.2
- run: terraform init
- run: terraform plan
- run: terraform apply -auto-approve
- name: Deploy to Kubernetes
run: |
kubectl apply -f k8s/deployment.yaml
kubectl rollout status deployment/myapp
21. SonarQube with Quality Gates
- Docker Setup:
docker run -d -p 9000:9000 sonarqube
- Pipeline Integration:
- name: SonarQube Scan run: | mvn sonar:sonar \ -Dsonar.projectKey=myapp \ -Dsonar.host.url=http://sonar.example.com \ -Dsonar.login=${{ secrets.SONAR_TOKEN }} \ -Dsonar.qualitygate.wait=true
- Quality Gates: Block deployments if code coverage < 80% or critical bugs exist.
This comprehensive guide has covered the essential aspects of building a secure and scalable cloud infrastructure using AWS, Terraform, Docker, and Kubernetes. Each section has been meticulously crafted to address the 21 key questions while incorporating best practices and suggestions for improvement. The real-time project example of an e-commerce platform ties together the various components, demonstrating how they interact in a cohesive architecture.
By following the guidelines and examples provided, you can ensure that your cloud infrastructure is not only robust and efficient but also secure and compliant with industry standards. Whether you’re a beginner or an experienced professional, this guide serves as a valuable resource for navigating the complexities of cloud infrastructure management.
Next Steps
- Implement the Best Practices: Start applying the security measures and architectural designs discussed in your own projects.
- Explore Further: Dive deeper into each technology mentioned, as there are always new features and updates to learn.
- Stay Updated: Follow AWS, Terraform, Docker, and Kubernetes blogs and documentation for the latest trends and practices in cloud infrastructure.
Labels: Building a Secure, Scalable Cloud Infrastructure: A Complete Guide with Real-Time Project
0 Comments:
Post a Comment
Note: only a member of this blog may post a comment.
<< Home