A Production-Ready Guide: Mastering Metrics Collection with Prometheus and Grafana
In modern infrastructure and application management, visibility into system performance is non-negotiable. Prometheus and Grafana form a powerhouse duo for metrics collection, storage, and visualization. This guide provides a production-ready walkthrough of setting up Prometheus and Grafana, complete with security best practices, service management, and advanced configurations. Whether you’re monitoring a single server or a distributed system, this guide will equip you with actionable steps to build a robust monitoring stack.
Table of Contents
-
Understanding the Tools
- What is Prometheus?
- What is Grafana?
- How They Work Together
-
Setting Up Prometheus
- Installing Prometheus
- Configuring Scrape Targets
- Running as a Systemd Service
-
Monitoring Host Metrics with Node Exporter
- Installing Node Exporter
- Systemd Service Setup
- Verifying Metrics
-
Installing Grafana Securely
- Adding the Grafana Repository
- HTTPS Configuration with Let’s Encrypt
- Enabling Authentication
-
Integrating Prometheus with Grafana
- Adding Prometheus as a Data Source
- Building a Dashboard: CPU, Memory, Disk, and Network
-
Advanced Configurations
- Setting Up Alerts in Grafana
- Remote Storage with Thanos
- Dynamic Service Discovery
-
Security Best Practices
- Securing Prometheus with Reverse Proxies
- Grafana User Permissions
-
Troubleshooting Common Issues
- Firewall and Port Conflicts
- Permission Errors
1. Understanding the Tools
What is Prometheus?
Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It pulls metrics from targets at configured intervals, stores them in a time-series database, and allows querying via PromQL. Ideal for dynamic environments (e.g., Kubernetes), it supports multi-dimensional data collection.
What is Grafana?
Grafana is a visualization platform that turns metrics into interactive dashboards. It supports 50+ data sources, including Prometheus, and offers features like alerts, annotations, and plugins.
How They Work Together
Prometheus collects and stores metrics, while Grafana queries and visualizes them. Together, they provide end-to-end monitoring for infrastructure, applications, and services.
2. Setting Up Prometheus
Step 1: Download and Install
# Fetch the latest version from https://prometheus.io/download/
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/
prometheus-2.47.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*
Step 2: Configure Scrape Targets
Edit prometheus.yml
to monitor Node Exporter:
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
Step 3: Run as a Systemd Service
Create /etc/systemd/system/prometheus.service
:
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
ExecStart=/opt/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/data
Restart=always
[Install]
WantedBy=multi-user.target
Reload and start:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
Step 4: Verify
Access http://localhost:9090/targets
. Targets should show UP status.
3. Monitoring Host Metrics with Node Exporter
Step 1: Install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/
node_exporter-1.6.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
cd node_exporter-*
Step 2: Systemd Service Setup
Create /etc/systemd/system/node_exporter.service
:
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/opt/node_exporter/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
Start the service:
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
Step 3: Verify Metrics
Visit http://localhost:9100/metrics
to see raw metrics (e.g., node_cpu_seconds_total
).
4. Installing Grafana Securely
Step 1: Add Grafana Repository
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com
stable main"
curl -fsSL https://apt.grafana.com/gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/grafana.gpg
sudo apt-get update
sudo apt-get install grafana
Step 2: Enable HTTPS with Let’s Encrypt
Install Certbot:
sudo apt-get install certbot python3-certbot-nginx
Obtain a certificate:
sudo certbot --nginx -d grafana.yourdomain.com
Update Grafana’s grafana.ini
:
[server]
protocol = https
cert_file = /etc/letsencrypt/live/grafana.yourdomain.com/fullchain.pem
cert_key = /etc/letsencrypt/live/grafana.yourdomain.com/privkey.pem
Step 3: Configure Authentication
In Grafana’s UI:
- Navigate to Admin > Authentication > Generic OAuth.
- Set up OAuth2 with Google/GitHub or enable built-in login.
5. Integrating Prometheus with Grafana
Step 1: Add Prometheus as a Data Source
- In Grafana, go to Configuration > Data Sources > Add data source.
- Select Prometheus.
- Set URL to
http://localhost:9090
(orhttp://prometheus:9090
if using DNS).
Step 2: Create a Dashboard
CPU Usage:
100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage:
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
Disk Usage:
100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})
Network Traffic:
sum(rate(node_network_receive_bytes_total[5m]) + rate(node_network_transmit_bytes_total[5m])) by (device)
6. Advanced Configurations
Alerts in Grafana
- Edit a panel and navigate to the Alert tab.
- Set conditions (e.g.,
CPU Usage > 90%
). - Configure notifications via email, Slack, or PagerDuty.
Remote Storage with Thanos
To extend Prometheus’ retention:
- Deploy Thanos Sidecar alongside Prometheus.
- Configure object storage (e.g., AWS S3):
storage:
thanos:
object_storage:
type: S3
config:
bucket: "thanos-storage"
endpoint: "s3.amazonaws.com"
Dynamic Service Discovery
Use Kubernetes SD or Consul in prometheus.yml
:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
7. Security Best Practices
Securing Prometheus
- Use Nginx as a reverse proxy with HTTPS and basic auth:
server {
listen 9090 ssl;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
auth_basic "Prometheus";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:9090;
}
}
Grafana Permissions
- Use role-based access control (RBAC) to limit dashboard access.
- Disable anonymous login in
grafana.ini
.
8. Troubleshooting
Firewall Issues
Ensure ports are open:
sudo ufw allow 9090/tcp # Prometheus
sudo ufw allow 3000/tcp # Grafana
sudo ufw allow 9100/tcp # Node Exporter
Permission Errors
Run services under dedicated users:
sudo useradd --no-create-home --shell /bin/false prometheus
sudo chown -R prometheus:prometheus /opt/prometheus
You’ve now built a secure, scalable monitoring stack with Prometheus and Grafana. From basic metric collection to advanced alerting and dynamic discovery, this setup is ready for production workloads. To deepen your expertise:
- Explore Grafana plugins like Loki for logs.
- Integrate Alertmanager for centralized alert routing.
- Experiment with custom exporters for application-specific metrics.
With these tools, you’re equipped to turn raw metrics into actionable insights, ensuring system reliability and performance.
Labels: A Production-Ready Guide: Mastering Metrics Collection with Prometheus and Grafana
0 Comments:
Post a Comment
Note: only a member of this blog may post a comment.
<< Home