Prometheus Notes: Difference between revisions

From Federal Burro of Information
Jump to navigationJump to search
No edit summary
Line 98: Line 98:
systemctl start prometheus-node-exporter.service
systemctl start prometheus-node-exporter.service
</pre>
</pre>
== cpu usage from cpu seconds ==
100 - (avg by (job) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
reference:  https://www.robustperception.io/understanding-machine-cpu-usage


== resources ==
== resources ==

Revision as of 16:07, 29 July 2020

PromQL

node exporter:

node_memory_MemAvailable_bytes{job=~"myjob.*"} / on ( instance ) node_memory_MemTotal_bytes{job=~"myjob.*"}
node_memory_MemFree_bytes{job=~"myjob.*"} / on ( instance ) node_memory_MemTotal_bytes{job=~"myjob.*"}
sum(kube_pod_container_resource_requests_cpu_cores) / sum(kube_node_status_capacity_cpu_cores) * 100


topk(
10,
count({job="prometheus"}) by (__name__)
)

renaming metrics

,pre> scrape_configs: ­- job_name: sql

 targets: [172.21.132.39:41212]
 metric_relabel_configs:

­ - source_labels: ['prometheus_metric_name']

   target_label: '__name__'
   regex: '(.*[^_])_*'
   replacement: '${1}'

­ - regex: prometheus_metric_name

   action: labeldrop

turns this:

query_result_dm_os_performance_counters{
  counter_instance="ex01",
  counter_name="log file(s) size (kb)",
  prometheus_metric_name="sqlserver_databases",
}

into :

sqlserver_databases{
  counter_instance="ex01",
  counter_name="log file(s) size (kb)",
}

dirty install node exporter

curl -L -o /tmp/node_exporter-1.0.1.linux-amd64.tar.gz https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
tar zxvf /tmp/node_exporter-1.0.1.linux-amd64.tar.gz -C /tmp/
cp /tmp/node_exporter-1.0.1.linux-amd64/node_exporter /usr/bin/prometheus-node-exporter

curl -L -o /tmp/node_exporter-1.0.1.linux-armv6.tar.gz https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-armv6.tar.gz
tar zxvf /tmp/node_exporter-1.0.1.linux-armv6.tar.gz
cp /tmp/node_exporter-1.0.1.linux-armv6/node_exporter /usr/bin/prometheus-node-exporter

chmod 755 /usr/bin/prometheus-node-exporter
chown root:root /usr/bin/prometheus-node-exporter

cat << EOF > /etc/default/prometheus-node-exporter
ARGS="--collector.diskstats.ignored-devices=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$  \
      --collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run)($|/) \
      --collector.netclass.ignored-devices=^lo$  \
      --collector.systemd
EOF
  
chown root:root /etc/default/prometheus-node-exporter
chmod 644 /etc/default/prometheus-node-exporter

cat << EOF > /lib/systemd/system/prometheus-node-exporter.service
[Unit]
Description=Prometheus exporter for machine metrics
Documentation=https://github.com/prometheus/node_exporter
[Service]
Restart=always
User=nobody  
EnvironmentFile=/etc/default/prometheus-node-exporter
ExecStart=/usr/bin/prometheus-node-exporter $ARGS
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
EOF

chown root:root /lib/systemd/system/prometheus-node-exporter.service
chmod 644 /lib/systemd/system/prometheus-node-exporter.service

systemctl daemon-reload
systemctl enable prometheus-node-exporter.service
systemctl start prometheus-node-exporter.service

cpu usage from cpu seconds

100 - (avg by (job) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

reference: https://www.robustperception.io/understanding-machine-cpu-usage


resources

https://timber.io/blog/promql-for-humans/

https://www.weave.works/blog/promql-queries-for-the-rest-of-us/

https://promcon.io/2018-munich/slides/taking-advantage-of-relabeling.pdf

https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085

https://www.robustperception.io/extracting-full-labels-from-consul-tags

https://blog.freshtracks.io/prometheus-relabel-rules-and-the-action-parameter-39c71959354a