Sunday, 16 March 2025

How to Write kubernetes manifest file for database server and mount pvc to it?

In the era of cloud-native applications, Kubernetes has emerged as the de facto platform for orchestrating containerized workloads. While stateless applications are relatively straightforward to manage, stateful applications like databases present unique challenges. Databases require persistent storage, stable network identities, and high availability—features that demand careful configuration in Kubernetes.

This guide provides an in-depth walkthrough of deploying a production-ready database (using PostgreSQL as an example) on Kubernetes. We’ll cover everything from foundational concepts like Persistent Volume Claims (PVCs) to advanced strategies for high availability, security, and disaster recovery. By the end, you’ll understand how to:

  • Use StatefulSets for stable, scalable database deployments.
  • Securely manage credentials with Kubernetes Secrets.
  • Configure Storage Classes for cloud-optimized storage.
  • Implement high availability and automated backups.
  • Monitor database health with Prometheus and Grafana.

Table of Contents

  1. Understanding Kubernetes Storage: PVs, PVCs, and Storage Classes
  2. Why StatefulSets Trump Deployments for Databases
  3. Step-by-Step: Writing a PostgreSQL StatefulSet Manifest
  4. Securing Your Database: Secrets and Network Policies
  5. Optimizing Storage with Cloud-Specific Storage Classes
  6. High Availability: Anti-Affinity and Readiness Probes
  7. Disaster Recovery: Automated Backups to S3
  8. Monitoring with Prometheus and PostgreSQL Exporter
  9. Common Pitfalls and How to Avoid Them

1. Understanding Kubernetes Storage: PVs, PVCs, and Storage Classes

Persistent Volumes (PVs)

A Persistent Volume (PV) is a cluster-wide storage resource provisioned by an administrator or dynamically via a Storage Class. PVs abstract the underlying storage infrastructure (e.g., AWS EBS, GCP Persistent Disk) and decouple storage from pods.

Persistent Volume Claims (PVCs)

A Persistent Volume Claim (PVC) is a user’s request for storage. PVCs bind to PVs, allowing pods to consume storage without needing to know the specifics of the storage backend.

# Example PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3-encrypted  # Uses a custom Storage Class
  resources:
    requests:
      storage: 50Gi

Storage Classes

A Storage Class defines the “type” of storage (e.g., SSD, HDD) and provisioning parameters. Cloud providers offer pre-configured Storage Classes, but you can create custom ones for encryption, performance, or cost optimization.

# AWS gp3 Storage Class with Encryption
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"  # Enables EBS encryption
volumeBindingMode: WaitForFirstConsumer  # Delays provisioning until pod scheduling

2. Why StatefulSets Trump Deployments for Databases

Deployments: Designed for Stateless Workloads

  • Pros: Simple scaling, rolling updates.
  • Cons: Pods are ephemeral, no stable network identity, shared storage.

StatefulSets: Built for Stateful Workloads

  • Stable Network Identity: Each pod gets a unique hostname (e.g., postgres-0, postgres-1).
  • Ordered Scaling: Pods are created/terminated sequentially.
  • Persistent Storage: Each pod gets its own PVC via volumeClaimTemplates.

3. Step-by-Step: Writing a PostgreSQL StatefulSet Manifest

Step 1: Create a Headless Service

A headless Service (no ClusterIP) enables direct DNS resolution to individual pods.

apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
    - port: 5432

Step 2: Define the StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 3  # 3-node HA setup
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:15.3  # Pinned version
          envFrom:
            - secretRef:
                name: postgres-secrets  # Securely inject credentials
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
          livenessProbe:
            exec:
              command:
                - sh
                - -c
                - "pg_isready -U admin -d mydatabase"
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - sh
                - -c
                - "pg_isready -U admin -d mydatabase"
          resources:
            requests:
              memory: 2Gi
              cpu: 500m
            limits:
              memory: 4Gi
              cpu: 2
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: postgres
              topologyKey: kubernetes.io/hostname  # Spread pods across nodes
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: gp3-encrypted
        resources:
          requests:
            storage: 50Gi

4. Securing Your Database: Secrets and Network Policies

Step 1: Create a Secret for Credentials

kubectl create secret generic postgres-secrets \
  --from-literal=POSTGRES_USER=admin \
  --from-literal=POSTGRES_PASSWORD=MySecurePassword! \
  --from-literal=POSTGRES_DB=mydatabase

Step 2: Restrict Access with Network Policies

Only allow traffic from your application pods.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-allow-app
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: my-app  # Replace with your app's label
      ports:
        - protocol: TCP
          port: 5432

5. Optimizing Storage with Cloud-Specific Storage Classes

AWS Example: gp3 with Encryption

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  iops: "10000"  # Customize IOPS for high-performance workloads
  throughput: "500"  # MB/s
volumeBindingMode: WaitForFirstConsumer

GCP Example: Regional Persistent Disk

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: regional-pd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-balanced
  replication-type: regional-pd  # Replicates across zones
volumeBindingMode: WaitForFirstConsumer

6. High Availability: Anti-Affinity and Readiness Probes

Pod Anti-Affinity

Prevents scheduling multiple database pods on the same node:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: postgres
        topologyKey: kubernetes.io/hostname

Liveness and Readiness Probes

Ensure Kubernetes restarts unhealthy pods and routes traffic only to ready instances.

7. Disaster Recovery: Automated Backups to S3

CronJob for Daily Backups

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: amazon/aws-cli:2.13.22
              command:
                - /bin/sh
                - -c
                - |
                  PGPASSWORD=$POSTGRES_PASSWORD pg_dump -h postgres-headless -U admin mydatabase | gzip | aws s3 cp - s3://my-bucket/backups/$(date +\%Y-\%m-\%d).sql.gz
              envFrom:
                - secretRef:
                    name: postgres-secrets
              env:
                - name: AWS_ACCESS_KEY_ID
                  valueFrom:
                    secretKeyRef:
                      name: aws-credentials
                      key: access_key
                - name: AWS_SECRET_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: aws-credentials
                      key: secret_key
          restartPolicy: OnFailure

8. Monitoring with Prometheus and PostgreSQL Exporter

Deploy PostgreSQL Exporter

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres-exporter
  template:
    metadata:
      labels:
        app: postgres-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9187"
    spec:
      containers:
        - name: exporter
          image: prometheuscommunity/postgres-exporter
          env:
            - name: DATA_SOURCE_NAME
              value: "postgresql://admin:$(POSTGRES_PASSWORD)@postgres-headless:5432/mydatabase?sslmode=disable"
          ports:
            - containerPort: 9187

Setting Up Prometheus

To monitor your PostgreSQL database, you need to set up Prometheus to scrape metrics from the PostgreSQL Exporter. This involves configuring Prometheus to include the exporter in its scrape configuration.

# Example Prometheus configuration
scrape_configs:
  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']

9. Common Pitfalls and How to Avoid Them

1. Not Using Resource Limits

Failing to set resource requests and limits can lead to resource starvation. Always define these in your StatefulSet.

resources:
  requests:
    memory: "2Gi"
    cpu: "500m"
  limits:
    memory: "4Gi"
    cpu: "2"

2. Ignoring Liveness and Readiness Probes

Without these probes, Kubernetes cannot effectively manage pod health. Always include them in your manifests.

3. Hardcoding Secrets

Avoid hardcoding sensitive information in your manifests. Use Kubernetes Secrets to manage credentials securely.

4. Not Implementing Backups

Regular backups are crucial for disaster recovery. Implement automated backups using CronJobs as shown earlier.

5. Lack of Monitoring

Without monitoring, you cannot effectively manage your database’s health. Set up Prometheus and Grafana to visualize metrics.

Deploying a production-grade database on Kubernetes requires careful planning and execution. By leveraging StatefulSets, Kubernetes Secrets, and cloud-specific Storage Classes, you can create a resilient and secure database environment. Implementing high availability strategies, automated backups, and monitoring will further enhance your database’s reliability.

Next Steps:

  • Experiment with the provided YAML manifests in a test environment.
  • Explore additional features like horizontal pod autoscaling for your database.
  • Consider integrating with tools like Grafana for advanced monitoring and alerting.

By following the best practices outlined in this guide, you can ensure that your database deployment on Kubernetes is robust, secure, and ready for production workloads.

Labels:

0 Comments:

Post a Comment

Note: only a member of this blog may post a comment.

<< Home