TMTOWTDI: Mastering Kubernetes, CI/CD, and Cloud Infrastructure: An In-Depth Guide with Real-World Examples

In the rapidly evolving landscape of cloud-native technologies, mastering Kubernetes, CI/CD pipelines, and cloud infrastructure has become essential for developers and operations teams alike. Whether you’re deploying microservices at scale, automating workflows, or securing cloud environments, the complexity of modern systems demands a deep understanding of both foundational concepts and advanced best practices.

This guide is designed to bridge the gap between theory and real-world application. It walks you through critical topics—from exposing Kubernetes applications with Services and Helm charts to hardening CI/CD pipelines with Jenkins and Terraform—while addressing common pain points like cluster troubleshooting, resource optimization, and infrastructure security.

Why This Guide?

Hands-On Focus: Every section includes actionable examples, from YAML snippets to Terraform configurations, so you can apply concepts immediately.
Real-World Scenarios: Learn how to diagnose a down Kubernetes cluster, configure zero-downtime deployments, and enforce security policies in AWS.
Expert Insights: Avoid pitfalls with proven strategies for managing secrets, tuning autoscaling, and optimizing CI/CD pipelines.

Whether you’re refining your DevOps skills or architecting a cloud-native platform from scratch, this guide equips you with the knowledge to build resilient, scalable, and secure systems. Let’s dive in.

Kubernetes Services: Exposing Applications
Accessing Applications via NodePort
Helm Charts: Simplifying Kubernetes Deployments
ConfigMaps and Secrets: Managing Configuration Data
Resource Management in Kubernetes
Troubleshooting a Down Kubernetes Cluster
LivenessProbe and ReadinessProbe: Ensuring Application Health
Monitoring with Prometheus and Grafana
Jenkins CI/CD Pipelines: From Basics to Advanced
Terraform: Secure State Management and Validation
AWS Autoscaling and Load Balancers
Security Best Practices
Optimizing CI/CD and Infrastructure Code
Conclusion

1. Kubernetes Services: Exposing Applications

Kubernetes services act as an abstraction layer to expose applications running in pods. Below are the four primary service types:

ClusterIP

Purpose: Internal communication within the cluster.

Example: A backend API service accessible only by other pods.

apiVersion: v1
kind: Service
metadata:
  name: internal-api
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app: backend

NodePort

Purpose: Expose a service on a static port across all nodes.
Port Range: 30000–32767 (default).

Example: Exposing a frontend app on port 31000.

apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 80
      nodePort: 31000
  selector:
    app: frontend

Key Note: Traffic can reach the service via any node’s IP (not just the node hosting the pod). In cloud environments, ensure firewall rules allow traffic on the NodePort.

LoadBalancer

Purpose: Automatically provision an external load balancer (cloud-specific).

Example: Exposing a production app via AWS NLB.

spec:
  type: LoadBalancer
  loadBalancerClass: service.k8s.aws/nlb

ExternalName

Purpose: Map a service to an external DNS name.

Example: Redirecting to a legacy database.

apiVersion: v1
kind: Service
metadata:
  name: legacy-db
spec:
  type: ExternalName
  externalName: legacy-db.example.com

2. Accessing Applications via NodePort

Step-by-Step Guide

Deploy the Service:
```
kubectl apply -f nodeport-service.yaml
```

Retrieve NodePort and Node IPs:

kubectl get svc frontend  # Output: PORT(S) 80:31000/TCP
kubectl get nodes -o wide # Output: EXTERNAL-IP=192.168.1.100

Access the Application:
```
http://192.168.1.100:31000
```

Cloud Consideration: In AWS/GCP, configure security groups to allow inbound traffic on the NodePort.

3. Helm Charts: Simplifying Kubernetes Deployments

Helm Chart Structure

mychart/
  Chart.yaml
  values.yaml
  templates/
    deployment.yaml
    service.yaml

Passing Variables

Default Values (values.yaml):

image:
  repository: nginx
  tag: stable

Template Reference (templates/deployment.yaml):

image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

Override via CLI:

helm install myapp --set image.tag=latest

Real-World Use Case

Deploying a WordPress site with Helm:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install wp bitnami/wordpress --set service.type=LoadBalancer

4. ConfigMaps and Secrets: Managing Configuration Data

ConfigMaps

Create from Literals:

kubectl create configmap app-config --from-literal=DB_HOST=db.example.com

Mount as Environment Variables:

env:
  - name: DB_HOST
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: DB_HOST

Secrets

Create a Secret:

kubectl create secret generic db-creds --from-literal=password=s3cret!

Mount as a Volume:

volumes:
  - name: creds
    secret:
      secretName: db-creds

Security Note: Enable Kubernetes Secrets encryption at rest or use HashiCorp Vault for production.

5. Resource Management in Kubernetes

Setting Requests and Limits

resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Exceeding Limits

Memory: Container is terminated (OOMKilled).
CPU: Process is throttled but not terminated.

Example: A memory-hungry app exceeding its limit:

kubectl describe pod myapp # Shows OOMKilled status

6. Troubleshooting a Down Kubernetes Cluster

Diagnostic Checklist

Check Node Status:
```
kubectl get nodes
```

Inspect Control Plane:

kubectl get componentstatuses  # Check scheduler/controller-manager/etcd
ssh master-node systemctl status kubelet

Pod-Level Debugging:

kubectl get pods -A | grep -v Running  # Identify non-running pods
kubectl logs -p <pod-name>            # Previous logs for crashed pods

Network Diagnostics:

kubectl run -it --rm debug --image=nicolaka/netshoot -- sh
nslookup my-svc.default.svc.cluster.local

7. LivenessProbe and ReadinessProbe: Ensuring Application Health

Probe Types

Probe Type	Purpose	Failure Action
LivenessProbe	Restart container if app is unresponsive.	`kubectl restart`
ReadinessProbe	Stop sending traffic if app isn’t ready (e.g., during startup).	Remove from Service endpoints.

Example Configuration

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  failureThreshold: 3

readinessProbe:
  exec:
    command: ["cat", "/app/ready"]
  initialDelaySeconds: 5

Tuning Tip: Adjust failureThreshold and periodSeconds based on app startup time.

8. Monitoring with Prometheus and Grafana

Prometheus Architecture

Service Monitoring

ServiceMonitor Example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
spec:
  endpoints:
    - port: web
  selector:
    matchLabels:
      app: myapp

Install Node Exporter:

helm install node-exporter prometheus-community/prometheus-node-exporter

Advanced Monitoring

Distributed Tracing: Use Jaeger to trace requests across microservices.
Custom Metrics: Expose app-specific metrics (e.g., http_requests_total).

9. Jenkins CI/CD Pipelines: From Basics to Advanced

Pipeline Example with Stages

pipeline {
    agent { label 'linux' }
    stages {
        stage('Build') {
            steps {
                sh 'mvn clean package'
            }
        }
        stage('Test') {
            steps {
                sh 'mvn test'
            }
        }
        stage('Deploy') {
            when {
                branch 'main'
            }
            steps {
                sh 'kubectl apply -f deploy.yaml'
            }
        }
    }
}

Shared Libraries

Configure in Jenkins:
- Navigate to Manage Jenkins → Configure System → Global Pipeline Libraries.
- Add a library named shared-lib from https://github.com/myorg/shared-lib.git.

Use in Pipeline:

@Library('shared-lib')_
pipeline {
    stages {
        stage('Build') {
            steps {
                mySharedFunction()
            }
        }
    }
}

10. Terraform: Secure State Management and Validation

Statefile Configuration with S3 and Locking

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/network.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-lock"  # Enable state locking
  }
}

Validation and Security

Validate Syntax:
```
terraform validate
```

S3 Bucket Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": "arn:aws:s3::: my-terraform-state/*",
      "Condition": {
        "StringNotEquals": {
          "aws:SourceAccount": "YOUR_AWS_ACCOUNT_ID"
        }
      }
    }
  ]
}

11. AWS Autoscaling and Load Balancers

Understanding Load Balancers

Application Load Balancer (ALB): Operates at the application layer (HTTP/HTTPS).
Network Load Balancer (NLB): Operates at the transport layer (TCP/UDP).

Setting Up Autoscaling

Create an Auto Scaling Group (ASG):

aws autoscaling create-auto-scaling-group --auto-scaling-group-name my-asg

--launch-configuration my-launch-config --min-size 1 --max-size 5 --desired-capacity 2 --vpc- zone-identifier subnet-12345678

Attach Load Balancer:

aws autoscaling attach-load-balancers --auto-scaling-group-name my-asg

--load-balancer-names my-load-balancer

Scaling Policies:
- Scale Up: Increase capacity based on CloudWatch metrics (e.g., CPU utilization).
- Scale Down: Decrease capacity when metrics fall below a threshold.

Custom CloudWatch Metrics Example

aws cloudwatch put-metric-alarm --alarm-name "HighCPUUtilization" --metric-name

"CPUUtilization" --namespace "AWS/EC2" --statistic "Average" --period 300 --threshold 70

--comparison-operator "GreaterThanThreshold" --dimensions "Name=AutoScalingGroupName,

Value=my-asg" --evaluation-periods 2 --alarm-actions "arn:aws:sns:us-east-1:123456789012:my-sns-topic"

12. Security Best Practices

Implementing RBAC

Role Definition:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Role Binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: jane
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Pod Security Admission (PSA)

Example Policy:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  runAsUser :
    rule: MustRunAs
    ranges:
    - min: 1000
      max: 2000

13. Optimizing CI/CD and Infrastructure Code

CI/CD Optimizations

Parallel Execution: Speed up builds by running tests in parallel.

stages {
    stage('Test') {
        parallel {
            stage('Unit Tests') {
                steps { sh 'mvn test' }
            }
            stage('Integration Tests') {
                steps { sh 'mvn verify' }
            }
        }
    }
}

Caching Dependencies: Use caching to speed up builds.

pipeline {
    agent any
    stages {
        stage('Build') {
            steps {
                cache(path: '.m2/repository', key: 'maven-cache') {
                    sh 'mvn clean package'
                }
            }
        }
    }
}

Infrastructure Code Management

Terraform Modularization: Organize Terraform code into reusable modules.
Using Terragrunt: Manage multiple Terraform configurations with ease.
```
include {
  path = find_in_parent_folders()
}
```

14. Conclusion

This comprehensive guide has covered essential topics in Kubernetes, CI/CD, and cloud infrastructure management. By understanding and implementing these concepts, you can build robust, scalable, and secure applications in the cloud.

TMTOWTDI [There's More Than One Way To Do It]

Main Menu

Thursday, 20 February 2025

Mastering Kubernetes, CI/CD, and Cloud Infrastructure: An In-Depth Guide with Real-World Examples

Table of Contents