Grafana+InfluxDB+Collectd构建监控系统

架构原理

Collectd(数据采集,配置Server连接InfluxDB的25826端口) -> InfluxDB(数据存储,启用collectd插件监听25826端口) —> Grafana(数据展示)

  • Collectd : C 语言开发的一个守护(daemon)进程,周期性收集统计数据和存储,拥有丰富的插件包括监控Ceph,DRBD,OpenLDAP,ZK等,类似statD(graphite也可以用来采集数据,不过展示功能没有Grafana丰富),数据可以存储在Kafka,InfluxDB,OpenTSDB等上* InfluxDB: GO开发的开源分布式时序数据库,适合存储指标,时间,分析等数据
  • Grafana: 是一个开源的,具有丰富指标仪表盘的数据展示和图表编辑工具,支持Graphite,Elasticsearch,OpenTSDB,Prometheus和influxDB,Zabbix等

Collectd

  1. 安装collectd

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    yum -y install perl-ExtUtils-Embed perl-ExtUtils-MakeMaker liboping*
    wget https://collectd.org/files/collectd-5.5.0.tar.gz
    tar xf collectd-5.5.0.tar.gz
    cd collectd-5.5.0
    ./configure --enable-cpu --enable-df --enable-disk --enable-interface --enable-load --enable-memory --enable-ping --enable-swap --enable-users --enable-uptime
    make && make install
    cp contrib/redhat/init.d-collectd /etc/rc.d/init.d/collectd
    chmod +x /etc/rc.d/init.d/collectd
    ln -s /opt/collectd/sbin/collectdmon /usr/sbin/
    ln -s /opt/collectd/sbin/collectd /usr/sbin/
  2. 配置collectd

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    vim /etc/collectd.conf
    BaseDir "/opt/collectd"
    PIDFile "/run/collectd.pid"
    Hostname "host.example.com"
    Interval 60
    <loadplugin df>
    Interval 120
    </loadplugin>
    LoadPlugin disk
    LoadPlugin interface
    LoadPlugin load
    LoadPlugin memory
    LoadPlugin network
    LoadPlugin processes
    LoadPlugin users
    <plugin interface>
    Interface "eth1"
    IgnoreSelected false
    </plugin>
    <plugin network>
    Server "10.44.38.244" "25826"
    </plugin>
  3. 说明
    默认collectd进程会每10s中调用注册在配置文件中的插件,默认全局参数interval=10s(10s上报一次数据到influxdb等),针对不同的插件可以配置不同的搜集数据的时间间隔interval

InfluxDB

  1. 安装并启动服务

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    cat < <EOF | sudo tee /etc/yum.repos.d/influxdb.repo
    [influxdb]
    name = InfluxDB Repository - RHEL \$releasever
    baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
    enabled = 1
    gpgcheck = 1
    gpgkey = https://repos.influxdata.com/influxdb.key
    EOF
    yum -y install influxdb
    service influxdb start
    启动后TCP端口:8083 为InfluxDB 管理控制台
    TCP端口:8086 为客户端和InfluxDB通信时的HTTP API
    启动后InfluxDB用户认证默认是关闭的,先创建用户:geekwolf geekwolf
    命令行输入influx
  2. 基本使用

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    [root@geekwolf ~]# influx
    Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
    Connected to http://localhost:8086 version 0.12.2
    InfluxDB shell 0.12.2
    > create database collectdb
    > create database collectdb
    > show databases
    name: databases
    \------
    name
    _internal
    collectdb
    > create user geekwolf with password 'geekwolf'
    > show users
    user admin
    geekwolf false
    > grant all on collectdb from to geekwolf
    > help show
    Usage:
    connect <host:port> connects to another node specified by host:port
    auth prompts for username and password
    pretty toggles pretty print for the json format
    use <db_name> sets current database
    format <format> specifies the format of the server responses: json, csv, or column
    precision </format><format> specifies the format of the timestamp: rfc3339, h, m, s, ms, u or ns
    consistency <level> sets write consistency level: any, one, quorum, or all
    history displays command history
    settings outputs the current settings for the shell
    exit/quit/ctrl+d quits the influx shell
    show databases show database names
    show series show series information
    show measurements show measurement information
    show tag keys show tag key information
    show field keys show field key information
    A full list of influxql commands can be found at:
    https://docs.influxdata.com/influxdb/v0.10/query_language/spec
  3. 启用认证

    1
    2
    3
    修改配置文件启用认证
    sed -i ’s#auth-enabled = false#auth-enabled = true#g’ /etc/influxdb/influxdb.conf
    service influxdb restart

配置InfluxDB支持Collectd

  1. 修改配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    vim /etc/influxdb/influxdb.conf
    [collectd]
    enabled = true
    bind-address = "10.44.38.244:25826"
    database = "collectdb"
    typesdb = "/opt/collectd/share/collectd/types.db"
    batch-size = 5000
    batch-pending = 10
    batch-timeout = "10s"
    read-buffer = 0
    service influxdb restart
  2. 查看metrics信息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    [root@geekwolf ~]# influx
    Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
    Connected to http://localhost:8086 version 0.12.2
    InfluxDB shell 0.12.2
    > use collectdb
    Using database collectdb
    > show field keys
    name: cpu_value
    ---------------
    fieldKey
    value
    name: df_free
    -------------
    fieldKey
    value
    name: df_used
    -------------
    fieldKey
    value
    name: disk_read
    ---------------
    fieldKey
    value
    > select * from cpu_value limit 15;
    name: cpu_value
    ---------------
    time host instance type type_instance value
    1461657293000000000 host.example.com 1 cpu idle 1.59845e+06
    1461657293000000000 host.example.com 1 cpu system 2316
    1461657293000000000 host.example.com 1 cpu nice 508
    1461657293000000000 host.example.com 0 cpu steal 0
    1461657293000000000 host.example.com 1 cpu user 11619
    1461657293000000000 host.example.com 1 cpu interrupt 0
    1461657293000000000 host.example.com 1 cpu steal 0
    1461657293000000000 host.example.com 1 cpu wait 172
    1461657293000000000 host.example.com 1 cpu softirq 0
    1461657303000000000 host.example.com 1 cpu wait 172
    1461657303000000000 host.example.com 1 cpu softirq 0
    1461657303000000000 host.example.com 1 cpu nice 508
    1461657303000000000 host.example.com 0 cpu idle 1.587007e+06
    1461657303000000000 host.example.com 0 cpu softirq 127
    1461657303000000000 host.example.com 0 cpu interrupt 54

安装配置Grafana

1
2
3
4
5
6
7
8
9
10
11
yum install https://grafanarel.s3.amazonaws.com/builds/grafana-3.0.0-beta51460725904.x86_64.rpm
目录结构
/usr/sbin/grafana-server
/etc/init.d/grafana-server 上述命令的拷贝,启动脚本
/etc/sysconfig/grafana-server 环境变量
/etc/grafana/grafana.ini 配置文件
/var/log/grafana/grafana.log 日志文件
/var/lib/grafana/grafana.db sqlite3数据库
启动服务: service grafana-server start
chkconfig grafana-server on

访问地址:http://10.44.38.244:3000 默认账号为admin admin

关闭Grafana注册功能:

1
sed -i ’s/#allow_sign_up = true/allow_sign_up = false/g’ /etc/grafana/grafana.ini,重启服务

  • 添加InfluxDB数据源

此处输入图片的描述

  • 添加ping图的例子

此处输入图片的描述

  • 图表展示

此处输入图片的描述

详细demo可参考:http://play.grafana.org/

问题总结

问题 :在使用influxdb0.12.x版本和Grafana2.6时出现multiple query syntax的bug,原因是influxdb的apiwent

此处输入图片的描述

解决方法: 升级Grafana2.6到Grafana3.0-beta1以上版本
https://github.com/grafana/grafana/commit/ed62822d442569e7ba287ff63d83a069a596c458

参考文档

http://docs.grafana.org
https://collectd.org/wiki/index.php/Table_of_Plugins
https://docs.influxdata.com/influxdb/v0.12/introduction/getting_started/

坚持原创分享,您的支持将鼓励我继续创作