k8s與監控--prometheus的遠端存儲

alighters 發布于2019-07-01 17:31 / 1343人閱讀

摘要：所以采用作為整個集群的監控方案是合適的。可以過濾需要寫到遠端存儲的。配置中，在的聯邦和遠程讀寫的可以考慮設置該配置項，從而區分各個集群。目前支持方案支持高可用。目前我們的持久化方案準備用。

prometheus的遠端存儲 前言

prometheus在容器云的領域實力毋庸置疑，越來越多的云原生組件直接提供prometheus的metrics接口，無需額外的exporter。所以采用prometheus作為整個集群的監控方案是合適的。但是metrics的存儲這塊，prometheus提供了本地存儲，即tsdb時序數據庫。本地存儲的優勢就是運維簡單，啟動prometheus只需一個命令，下面兩個啟動參數指定了數據路徑和保存時間。

storage.tsdb.path: tsdb數據庫路徑，默認 data/

storage.tsdb.retention: 數據保留時間，默認15天

缺點就是無法大量的metrics持久化。當然prometheus2.0以后壓縮數據能力得到了很大的提升。
為了解決單節點存儲的限制，prometheus沒有自己實現集群存儲，而是提供了遠程讀寫的接口，讓用戶自己選擇合適的時序數據庫來實現prometheus的擴展性。
prometheus通過下面兩張方式來實現與其他的遠端存儲系統對接

Prometheus 按照標準的格式將metrics寫到遠端存儲

prometheus 按照標準格式從遠端的url來讀取metrics

下面我將重點剖析遠端存儲的方案

遠端存儲方案 配置文件

遠程寫

# The URL of the endpoint to send samples to.
url: 

# Timeout for requests to the remote write endpoint.
[ remote_timeout:  | default = 30s ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ -  ... ]

# Sets the `Authorization` header on every remote write request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username:  ]
  [ password:  ]
  [ password_file:  ]

# Sets the `Authorization` header on every remote write request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token:  ]

# Sets the `Authorization` header on every remote write request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote write request"s TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]

# Configures the queue used to write to remote storage.
queue_config:
  # Number of samples to buffer per shard before we start dropping them.
  [ capacity:  | default = 100000 ]
  # Maximum number of shards, i.e. amount of concurrency.
  [ max_shards:  | default = 1000 ]
  # Maximum number of samples per send.
  [ max_samples_per_send:  | default = 100]
  # Maximum time a sample will wait in buffer.
  [ batch_send_deadline:  | default = 5s ]
  # Maximum number of times to retry a batch on recoverable errors.
  [ max_retries:  | default = 10 ]
  # Initial retry delay. Gets doubled for every retry.
  [ min_backoff:  | default = 30ms ]
  # Maximum retry delay.
  [ max_backoff:  | default = 100ms ]

遠程讀

# The URL of the endpoint to query from.
url: 

# An optional list of equality matchers which have to be
# present in a selector to query the remote read endpoint.
required_matchers:
  [ :  ... ]

# Timeout for requests to the remote read endpoint.
[ remote_timeout:  | default = 1m ]

# Whether reads should be made for queries for time ranges that
# the local storage should have complete data for.
[ read_recent:  | default = false ]

# Sets the `Authorization` header on every remote read request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username:  ]
  [ password:  ]
  [ password_file:  ]

# Sets the `Authorization` header on every remote read request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token:  ]

# Sets the `Authorization` header on every remote read request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote read request"s TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]

遠程寫配置中的write_relabel_configs 該配置項，充分利用了prometheus強大的relabel的功能。可以過濾需要寫到遠端存儲的metrics。

例如：選擇指定的metrics。

remote_write:
      - url: "http://prometheus-remote-storage-adapter-svc:9201/write"
        write_relabel_configs:
        - action: keep
          source_labels: [__name__]
          regex: container_network_receive_bytes_total|container_network_receive_packets_dropped_total

global配置中external_labels，在prometheus的聯邦和遠程讀寫的可以考慮設置該配置項，從而區分各個集群。

global:
      scrape_interval: 20s
      # The labels to add to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
        cid: "9"

已有的遠端存儲的方案

現在社區已經實現了以下的遠程存儲方案

AppOptics: write

Chronix: write

Cortex: read and write

CrateDB: read and write

Elasticsearch: write

Gnocchi: write

Graphite: write

InfluxDB: read and write

OpenTSDB: write

PostgreSQL/TimescaleDB: read and write

SignalFx: write

上面有些存儲是只支持寫的。其實研讀源碼，能否支持遠程讀，
取決于該存儲是否支持正則表達式的查詢匹配。具體實現下一節，將會解讀一下prometheus-postgresql-adapter和如何實現一個自己的adapter。
同時支持遠程讀寫的

Cortex來源于weave公司，整個架構對prometheus做了上層的封裝，用到了很多組件。稍微復雜。

InfluxDB 開源版不支持集群。對于metrics量比較大的,寫入壓力大，然后influxdb-relay方案并不是真正的高可用。當然餓了么開源了influxdb-proxy，有興趣的可以嘗試一下。

CrateDB 基于es。具體了解不多

TimescaleDB 個人比較中意該方案。傳統運維對pgsql熟悉度高，運維靠譜。目前支持 streaming replication方案支持高可用。

后記

其實如果收集的metrics用于數據分析，可以考慮clickhouse數據庫，集群方案和寫入性能以及支持遠程讀寫。這塊正在研究中。待有了一定成果以后再專門寫一篇文章解讀。目前我們的持久化方案準備用TimescaleDB。

GPU云服務器云服務器 prometheus k8s 存儲方案 k8s存儲方案 k8s共享存儲

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://specialneedsforspecialkids.com/yun/33070.html

容器監控實踐—Prometheus基本架構

摘要：根據配置文件，對接收到的警報進行處理，發出告警。在默認情況下，用戶只需要部署多套，采集相同的即可實現基本的。通過將監控與數據分離，能夠更好地進行彈性擴展。參考文檔本文為容器監控實踐系列文章，完整內容見系統架構圖 1.x版本的Prometheus的架構圖為：showImg(https://segmentfault.com/img/remote/1460000018372350?w=14...

gghyoo 2019-07-01 17:36 評論0 收藏0
容器監控實踐—Prometheus基本架構

摘要：根據配置文件，對接收到的警報進行處理，發出告警。在默認情況下，用戶只需要部署多套，采集相同的即可實現基本的。通過將監控與數據分離，能夠更好地進行彈性擴展。參考文檔本文為容器監控實踐系列文章，完整內容見系統架構圖 1.x版本的Prometheus的架構圖為：showImg(https://segmentfault.com/img/remote/1460000018372350?w=14...

elina 2019-07-01 17:06 評論0 收藏0
容器監控實踐—Prometheus存儲機制

摘要：為了解決單節點存儲的限制，沒有自己實現集群存儲，而是提供了遠程讀寫的接口，讓用戶自己選擇合適的時序數據庫來實現的擴展性。的其實是一個，至于在的另一端是什么類型的時序數據庫它根本不關心，如果你愿意，你也可以編寫自己的。概述 Prometheus提供了本地存儲，即tsdb時序數據庫，本地存儲給Prometheus帶來了簡單高效的使用體驗，prometheus2.0以后壓縮數據能力也得到了...

BWrong 2019-07-01 17:38 評論0 收藏0
容器監控實踐—Prometheus存儲機制

摘要：為了解決單節點存儲的限制，沒有自己實現集群存儲，而是提供了遠程讀寫的接口，讓用戶自己選擇合適的時序數據庫來實現的擴展性。的其實是一個，至于在的另一端是什么類型的時序數據庫它根本不關心，如果你愿意，你也可以編寫自己的。概述 Prometheus提供了本地存儲，即tsdb時序數據庫，本地存儲給Prometheus帶來了簡單高效的使用體驗，prometheus2.0以后壓縮數據能力也得到了...

cppowboy 2019-06-28 17:08 評論0 收藏0