Thursday, 20 February 2025

Mastering Kubernetes, CI/CD, and Cloud Infrastructure: An In-Depth Guide with Real-World Examples


In the rapidly evolving landscape of cloud-native technologies, mastering Kubernetes, CI/CD pipelines, and cloud infrastructure has become essential for developers and operations teams alike. Whether you’re deploying microservices at scale, automating workflows, or securing cloud environments, the complexity of modern systems demands a deep understanding of both foundational concepts and advanced best practices.

This guide is designed to bridge the gap between theory and real-world application. It walks you through critical topics—from exposing Kubernetes applications with Services and Helm charts to hardening CI/CD pipelines with Jenkins and Terraform—while addressing common pain points like cluster troubleshooting, resource optimization, and infrastructure security.

Why This Guide?

Hands-On Focus: Every section includes actionable examples, from YAML snippets to Terraform configurations, so you can apply concepts immediately.
Real-World Scenarios: Learn how to diagnose a down Kubernetes cluster, configure zero-downtime deployments, and enforce security policies in AWS.
Expert Insights: Avoid pitfalls with proven strategies for managing secrets, tuning autoscaling, and optimizing CI/CD pipelines.

Whether you’re refining your DevOps skills or architecting a cloud-native platform from scratch, this guide equips you with the knowledge to build resilient, scalable, and secure systems. Let’s dive in.

Table of Contents

  1. Kubernetes Services: Exposing Applications
  2. Accessing Applications via NodePort
  3. Helm Charts: Simplifying Kubernetes Deployments
  4. ConfigMaps and Secrets: Managing Configuration Data
  5. Resource Management in Kubernetes
  6. Troubleshooting a Down Kubernetes Cluster
  7. LivenessProbe and ReadinessProbe: Ensuring Application Health
  8. Monitoring with Prometheus and Grafana
  9. Jenkins CI/CD Pipelines: From Basics to Advanced
  10. Terraform: Secure State Management and Validation
  11. AWS Autoscaling and Load Balancers
  12. Security Best Practices
  13. Optimizing CI/CD and Infrastructure Code
  14. Conclusion

1. Kubernetes Services: Exposing Applications

Kubernetes services act as an abstraction layer to expose applications running in pods. Below are the four primary service types:

ClusterIP

  • Purpose: Internal communication within the cluster.
  • Example: A backend API service accessible only by other pods.
    apiVersion: v1
    kind: Service
    metadata:
      name: internal-api
    spec:
      type: ClusterIP
      ports:
        - port: 80
          targetPort: 8080
      selector:
        app: backend
    

NodePort

  • Purpose: Expose a service on a static port across all nodes.
  • Port Range: 30000–32767 (default).
  • Example: Exposing a frontend app on port 31000.
    apiVersion: v1
    kind: Service
    metadata:
      name: frontend
    spec:
      type: NodePort
      ports:
        - port: 80
          targetPort: 80
          nodePort: 31000
      selector:
        app: frontend
    
  • Key Note: Traffic can reach the service via any node’s IP (not just the node hosting the pod). In cloud environments, ensure firewall rules allow traffic on the NodePort.

LoadBalancer

  • Purpose: Automatically provision an external load balancer (cloud-specific).
  • Example: Exposing a production app via AWS NLB.
    spec:
      type: LoadBalancer
      loadBalancerClass: service.k8s.aws/nlb
    

ExternalName

  • Purpose: Map a service to an external DNS name.
  • Example: Redirecting to a legacy database.
    apiVersion: v1
    kind: Service
    metadata:
      name: legacy-db
    spec:
      type: ExternalName
      externalName: legacy-db.example.com

2. Accessing Applications via NodePort

Step-by-Step Guide

  1. Deploy the Service:

    kubectl apply -f nodeport-service.yaml
    
  2. Retrieve NodePort and Node IPs:

    kubectl get svc frontend  # Output: PORT(S) 80:31000/TCP
    kubectl get nodes -o wide # Output: EXTERNAL-IP=192.168.1.100
    
  3. Access the Application:

    http://192.168.1.100:31000
    

Cloud Consideration: In AWS/GCP, configure security groups to allow inbound traffic on the NodePort.

3. Helm Charts: Simplifying Kubernetes Deployments

Helm Chart Structure

mychart/
  Chart.yaml
  values.yaml
  templates/
    deployment.yaml
    service.yaml

Passing Variables

  • Default Values (values.yaml):

    image:
      repository: nginx
      tag: stable
    
  • Template Reference (templates/deployment.yaml):

    image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
    
  • Override via CLI:

    helm install myapp --set image.tag=latest
    

Real-World Use Case

Deploying a WordPress site with Helm:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install wp bitnami/wordpress --set service.type=LoadBalancer

4. ConfigMaps and Secrets: Managing Configuration Data

ConfigMaps

  • Create from Literals:

    kubectl create configmap app-config --from-literal=DB_HOST=db.example.com
    
  • Mount as Environment Variables:

    env:
      - name: DB_HOST
        valueFrom:
          configMapKeyRef:
            name: app-config
            key: DB_HOST
    

Secrets

  • Create a Secret:

    kubectl create secret generic db-creds --from-literal=password=s3cret!
    
  • Mount as a Volume:

    volumes:
      - name: creds
        secret:
          secretName: db-creds
    

Security Note: Enable Kubernetes Secrets encryption at rest or use HashiCorp Vault for production.

5. Resource Management in Kubernetes

Setting Requests and Limits

resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Exceeding Limits

  • Memory: Container is terminated (OOMKilled).
  • CPU: Process is throttled but not terminated.

Example: A memory-hungry app exceeding its limit:

kubectl describe pod myapp # Shows OOMKilled status

6. Troubleshooting a Down Kubernetes Cluster

Diagnostic Checklist

  1. Check Node Status:

    kubectl get nodes
    
  2. Inspect Control Plane:

    kubectl get componentstatuses  # Check scheduler/controller-manager/etcd
    ssh master-node systemctl status kubelet
    
  3. Pod-Level Debugging:

    kubectl get pods -A | grep -v Running  # Identify non-running pods
    kubectl logs -p <pod-name>            # Previous logs for crashed pods
    
  4. Network Diagnostics:

    kubectl run -it --rm debug --image=nicolaka/netshoot -- sh
    nslookup my-svc.default.svc.cluster.local
    

7. LivenessProbe and ReadinessProbe: Ensuring Application Health

Probe Types

Probe Type Purpose Failure Action
LivenessProbe Restart container if app is unresponsive. kubectl restart
ReadinessProbe Stop sending traffic if app isn’t ready (e.g., during startup). Remove from Service endpoints.

Example Configuration

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  failureThreshold: 3

readinessProbe:
  exec:
    command: ["cat", "/app/ready"]
  initialDelaySeconds: 5

Tuning Tip: Adjust failureThreshold and periodSeconds based on app startup time.

8. Monitoring with Prometheus and Grafana

Prometheus Architecture

Service Monitoring

  • ServiceMonitor Example:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: myapp-monitor
    spec:
      endpoints:
        - port: web
      selector:
        matchLabels:
          app: myapp
    
  • Install Node Exporter:

    helm install node-exporter prometheus-community/prometheus-node-exporter
    

Advanced Monitoring

  • Distributed Tracing: Use Jaeger to trace requests across microservices.
  • Custom Metrics: Expose app-specific metrics (e.g., http_requests_total).

9. Jenkins CI/CD Pipelines: From Basics to Advanced

Pipeline Example with Stages

pipeline {
    agent { label 'linux' }
    stages {
        stage('Build') {
            steps {
                sh 'mvn clean package'
            }
        }
        stage('Test') {
            steps {
                sh 'mvn test'
            }
        }
        stage('Deploy') {
            when {
                branch 'main'
            }
            steps {
                sh 'kubectl apply -f deploy.yaml'
            }
        }
    }
}

Shared Libraries

  1. Configure in Jenkins:

    • Navigate to Manage Jenkins → Configure System → Global Pipeline Libraries.
    • Add a library named shared-lib from https://github.com/myorg/shared-lib.git.
  2. Use in Pipeline:

    @Library('shared-lib')_
    pipeline {
        stages {
            stage('Build') {
                steps {
                    mySharedFunction()
                }
            }
        }
    }
    

10. Terraform: Secure State Management and Validation

Statefile Configuration with S3 and Locking

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/network.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-lock"  # Enable state locking
  }
}

Validation and Security

  • Validate Syntax:
    terraform validate
    
  • S3 Bucket Policy:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Deny",
          "Principal": "*",
          "Action": "s3:*",
          "Resource": "arn:aws:s3::: my-terraform-state/*",
          "Condition": {
            "StringNotEquals": {
              "aws:SourceAccount": "YOUR_AWS_ACCOUNT_ID"
            }
          }
        }
      ]
    }
    

11. AWS Autoscaling and Load Balancers

Understanding Load Balancers

  • Application Load Balancer (ALB): Operates at the application layer (HTTP/HTTPS).
  • Network Load Balancer (NLB): Operates at the transport layer (TCP/UDP).

Setting Up Autoscaling

  1. Create an Auto Scaling Group (ASG):

    aws autoscaling create-auto-scaling-group --auto-scaling-group-name my-asg 
         --launch-configuration my-launch-config --min-size 1 --max-size 5 --desired-capacity 2 --vpc-               zone-identifier subnet-12345678
  1. Attach Load Balancer:

    aws autoscaling attach-load-balancers --auto-scaling-group-name my-asg 
          --load-balancer-names my-load-balancer
  1. Scaling Policies:

    • Scale Up: Increase capacity based on CloudWatch metrics (e.g., CPU utilization).
    • Scale Down: Decrease capacity when metrics fall below a threshold.

Custom CloudWatch Metrics Example

aws cloudwatch put-metric-alarm --alarm-name "HighCPUUtilization" --metric-name 
"CPUUtilization" --namespace "AWS/EC2" --statistic "Average" --period 300 --threshold 70 
--comparison-operator "GreaterThanThreshold" --dimensions "Name=AutoScalingGroupName,
Value=my-asg" --evaluation-periods 2 --alarm-actions "arn:aws:sns:us-east-1:123456789012:my-sns-topic"

12. Security Best Practices

Implementing RBAC

  • Role Definition:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      namespace: default
      name: pod-reader
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "list", "watch"]
    
  • Role Binding:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: read-pods
      namespace: default
    subjects:
    - kind: User
      name: jane
      apiGroup: rbac.authorization.k8s.io
    roleRef:
      kind: Role
      name: pod-reader
      apiGroup: rbac.authorization.k8s.io
    

Pod Security Admission (PSA)

  • Example Policy:
    apiVersion: policy/v1beta1
    kind: PodSecurityPolicy
    metadata:
      name: restricted
    spec:
      privileged: false
      allowPrivilegeEscalation: false
      runAsUser :
        rule: MustRunAs
        ranges:
        - min: 1000
          max: 2000
    

13. Optimizing CI/CD and Infrastructure Code

CI/CD Optimizations

  • Parallel Execution: Speed up builds by running tests in parallel.

    stages {
        stage('Test') {
            parallel {
                stage('Unit Tests') {
                    steps { sh 'mvn test' }
                }
                stage('Integration Tests') {
                    steps { sh 'mvn verify' }
                }
            }
        }
    }
    
  • Caching Dependencies: Use caching to speed up builds.

    pipeline {
        agent any
        stages {
            stage('Build') {
                steps {
                    cache(path: '.m2/repository', key: 'maven-cache') {
                        sh 'mvn clean package'
                    }
                }
            }
        }
    }
    

Infrastructure Code Management

  • Terraform Modularization: Organize Terraform code into reusable modules.
  • Using Terragrunt: Manage multiple Terraform configurations with ease.
    include {
      path = find_in_parent_folders()
    }
    

14. Conclusion

This comprehensive guide has covered essential topics in Kubernetes, CI/CD, and cloud infrastructure management. By understanding and implementing these concepts, you can build robust, scalable, and secure applications in the cloud.

Further Reading

By following the best practices and examples outlined in this guide, you will be well-equipped to navigate the complexities of modern cloud-native application development and deployment. Embrace these tools and methodologies to enhance your workflow, improve collaboration, and ensure the reliability of your applications in production environments.

Labels: , ,

0 Comments:

Post a Comment

Note: only a member of this blog may post a comment.

<< Home