949 lines
35 KiB
ReStructuredText
949 lines
35 KiB
ReStructuredText
|
|
.. _Methodology_for_Containerized_Openstack_Monitoring:
|
|
|
|
**************************************************
|
|
Methodology for Containerized Openstack Monitoring
|
|
**************************************************
|
|
|
|
:Abstract:
|
|
|
|
This document describes one of the Containerized Openstack monitoring solutions
|
|
to provide scalable and comprehensive architecture and obtain all crucial performance
|
|
metrics on each structure layer.
|
|
|
|
|
|
Containerized Openstack Monitoring Architecture
|
|
===============================================
|
|
|
|
This part of documentation describes required performance metrics in each
|
|
distinguished Containerized Openstack layer.
|
|
|
|
Containerized Openstack comprises three layers where Monitoring System should
|
|
be able to query all necessary counters:
|
|
- OS layer
|
|
- Kubernetes layer
|
|
- Openstack layer
|
|
|
|
Monitoring instruments must be logically divided in two groups:
|
|
- Monitoring Server Side
|
|
- Node Client Side
|
|
|
|
Operation System Layer
|
|
----------------------
|
|
|
|
We were using Ubuntu Xenial on top of bare-metal servers for both server and node side.
|
|
|
|
Baremetal hardware description
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
We deployed everything at 200 servers environment with following hardware characteristics:
|
|
|
|
.. table::
|
|
|
|
+-------+----------------+------------------------+
|
|
|server |vendor,model |HP,DL380 Gen9 |
|
|
+-------+----------------+------------------------+
|
|
|CPU |vendor,model |Intel,E5-2680 v3 |
|
|
| +----------------+------------------------+
|
|
| |processor_count |2 |
|
|
| +----------------+------------------------+
|
|
| |core_count |12 |
|
|
| +----------------+------------------------+
|
|
| |frequency_MHz |2500 |
|
|
+-------+----------------+------------------------+
|
|
|RAM |vendor,model |HP,752369-081 |
|
|
| +----------------+------------------------+
|
|
| |amount_MB |262144 |
|
|
+-------+----------------+------------------------+
|
|
|NETWORK|interface_name |p1p1 |
|
|
| +----------------+------------------------+
|
|
| |vendor,model |Intel,X710 Dual Port |
|
|
| +----------------+------------------------+
|
|
| |bandwidth |10G |
|
|
+-------+----------------+------------------------+
|
|
|STORAGE|dev_name |/dev/sda |
|
|
| +----------------+------------------------+
|
|
| |vendor,model | | raid10 - HP P840 |
|
|
| | | | 12 disks EH0600JEDHE |
|
|
| +----------------+------------------------+
|
|
| |SSD/HDD |HDD |
|
|
| +----------------+------------------------+
|
|
| |size | 3,6TB |
|
|
+-------+----------------+------------------------+
|
|
|
|
Operating system configuration
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Baremetal nodes were provisioned with Cobbler with our in-home preseed scripts.
|
|
OS versions we used:
|
|
|
|
.. table:: Versions Operating Systems
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|Ubuntu |Ubuntu 16.04.1 LTS |
|
|
+--------------------+-----------------------------------------+
|
|
|Kernel |4.4.0-47-generic |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
You can find /etc folder contents from the one of the typical system we were using:
|
|
|
|
:download:`etc_tarball <configs/node1.tar.gz>`
|
|
|
|
Required system metrics
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
At this layer we must get this list of processes:
|
|
|
|
.. table::
|
|
|
|
+------------------------+-----------------------------------------+
|
|
|List of processes |Mariadb |
|
|
| +-----------------------------------------+
|
|
| |Rabbitmq |
|
|
| |-----------------------------------------+
|
|
| |Keystone |
|
|
| +-----------------------------------------+
|
|
| |Glance |
|
|
| +-----------------------------------------+
|
|
| |Cinder |
|
|
| +-----------------------------------------+
|
|
| |Nova |
|
|
| +-----------------------------------------+
|
|
| |Neutron |
|
|
| +-----------------------------------------+
|
|
| |Openvswitch |
|
|
| +-----------------------------------------+
|
|
| |Kubernetes |
|
|
+------------------------+-----------------------------------------+
|
|
|
|
And following list of metrics:
|
|
|
|
.. table::
|
|
|
|
+------------------------+-----------------------------------------+
|
|
|Node load average |1min |
|
|
| +-----------------------------------------+
|
|
| |5min |
|
|
| |-----------------------------------------+
|
|
| |15min |
|
|
+------------------------+-----------------------------------------+
|
|
|Global process stats |Running |
|
|
| +-----------------------------------------+
|
|
| |Stopped |
|
|
| |-----------------------------------------+
|
|
| |Waiting |
|
|
+------------------------+-----------------------------------------+
|
|
|Global CPU Usage | Steal |
|
|
| +-----------------------------------------+
|
|
| | Wait |
|
|
| +-----------------------------------------+
|
|
| | User |
|
|
| +-----------------------------------------+
|
|
| | System |
|
|
| +-----------------------------------------+
|
|
| | Interrupt |
|
|
| +-----------------------------------------+
|
|
| | Nice |
|
|
| +-----------------------------------------+
|
|
| | Idle |
|
|
+------------------------+-----------------------------------------+
|
|
|Per CPU Usage | User |
|
|
| +-----------------------------------------+
|
|
| | System |
|
|
+------------------------+-----------------------------------------+
|
|
|Global memory usage |bandwidth |
|
|
| +-----------------------------------------+
|
|
| |Cached |
|
|
| +-----------------------------------------+
|
|
| |Buffered |
|
|
| +-----------------------------------------+
|
|
| |Free |
|
|
| +-----------------------------------------+
|
|
| |Used |
|
|
| +-----------------------------------------+
|
|
| |Total |
|
|
+------------------------+-----------------------------------------+
|
|
|Numa monitoring |Numa_hit |
|
|
|For each node +-----------------------------------------+
|
|
| |Numa_miss |
|
|
| |-----------------------------------------+
|
|
| |Numa_foreign |
|
|
| +-----------------------------------------+
|
|
| |Local_node |
|
|
| +-----------------------------------------+
|
|
| |Other_node |
|
|
+------------------------+-----------------------------------------+
|
|
|Numa monitoring |Huge |
|
|
|For each pid +-----------------------------------------+
|
|
| |Heap |
|
|
| |-----------------------------------------+
|
|
| |Stack |
|
|
| +-----------------------------------------+
|
|
| |Private |
|
|
+------------------------+-----------------------------------------+
|
|
|Global IOSTAT \+ |Merge reads /s |
|
|
|Per device IOSTAT +-----------------------------------------+
|
|
| |Merge write /s |
|
|
| +-----------------------------------------+
|
|
| |read/s |
|
|
| +-----------------------------------------+
|
|
| |write/s |
|
|
| +-----------------------------------------+
|
|
| |Read transfer |
|
|
| +-----------------------------------------+
|
|
| |Write transfer |
|
|
| +-----------------------------------------+
|
|
| |Read latency |
|
|
| +-----------------------------------------+
|
|
| |Write latency |
|
|
| +-----------------------------------------+
|
|
| |Write transfer |
|
|
| +-----------------------------------------+
|
|
| |Queue size |
|
|
| +-----------------------------------------+
|
|
| |Await |
|
|
+------------------------+-----------------------------------------+
|
|
|Network per interface |Octets /s (in, out) |
|
|
| +-----------------------------------------+
|
|
| |Packet /s (in, out) |
|
|
| |-----------------------------------------+
|
|
| |Dropped /s |
|
|
+------------------------+-----------------------------------------+
|
|
|Other system metrics |Entropy |
|
|
| +-----------------------------------------+
|
|
| |DF per device |
|
|
+------------------------+-----------------------------------------+
|
|
|
|
Kubernetes Layer
|
|
----------------
|
|
|
|
`Kargo`_ from `Fuel-CCP-installer`_ was our main tool to deploy K8S
|
|
on top of provisioned systems (monitored nodes).
|
|
|
|
Kargo sets up Kubernetes in the following way:
|
|
|
|
- masters: Calico, Kubernetes API services
|
|
- nodes: Calico, Kubernetes minion services
|
|
- etcd: etcd service
|
|
|
|
Kargo deployment parameters
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
You can find Kargo deployment script in `Kargo deployment script`_ section
|
|
|
|
.. code:: bash
|
|
|
|
docker_options: "--insecure-registry 172.20.8.35:5000 -D"
|
|
upstream_dns_servers: [172.20.8.34, 8.8.4.4]
|
|
nameservers: [172.20.8.34, 8.8.4.4]
|
|
kube_service_addresses: 10.224.0.0/12
|
|
kube_pods_subnet: 10.240.0.0/12
|
|
kube_network_node_prefix: 22
|
|
kube_apiserver_insecure_bind_address: "0.0.0.0"
|
|
dns_replicas: 3
|
|
dns_cpu_limit: "100m"
|
|
dns_memory_limit: "512Mi"
|
|
dns_cpu_requests: "70m"
|
|
dns_memory_requests: "70Mi"
|
|
deploy_netchecker: false
|
|
|
|
.. table::
|
|
|
|
+----------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+----------------------+-----------------------------------------+
|
|
|`Fuel-CCP-Installer`_ |6fd81252cb2d2c804f388337aa67d4403700f094 |
|
|
| | |
|
|
+----------------------+-----------------------------------------+
|
|
|`Kargo`_ |2c23027794d7851ee31363c5b6594180741ee923 |
|
|
+----------------------+-----------------------------------------+
|
|
|
|
Required K8S metrics
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Here we should get K8S health
|
|
metrics and ETCD performance metrics:
|
|
|
|
.. table::
|
|
|
|
+------------------------+-----------------------------------------+
|
|
|ETCD performance metrics|members count / states |
|
|
| +-----------------------------------------+
|
|
| |numbers of keys in a cluster |
|
|
| |-----------------------------------------+
|
|
| |Size of data set |
|
|
| +-----------------------------------------+
|
|
| |Avg. latency from leader to followers |
|
|
| +-----------------------------------------+
|
|
| |Bandwidth rate, send/receive |
|
|
| +-----------------------------------------+
|
|
| |Create store success/fail |
|
|
| +-----------------------------------------+
|
|
| |Get success/fail |
|
|
| +-----------------------------------------+
|
|
| |Set success/fail |
|
|
| +-----------------------------------------+
|
|
| |Package rate, send/receive |
|
|
| +-----------------------------------------+
|
|
| |Expire count |
|
|
| +-----------------------------------------+
|
|
| |Update success/fail |
|
|
| +-----------------------------------------+
|
|
| |Compare-and-swap success/fail |
|
|
| +-----------------------------------------+
|
|
| |Watchers |
|
|
| +-----------------------------------------+
|
|
| |Delete success/fail |
|
|
| +-----------------------------------------+
|
|
| |Compare-and-delete success/fail |
|
|
| +-----------------------------------------+
|
|
| |Append req, send/ receive |
|
|
+------------------------+-----------------------------------------+
|
|
|K8S health metrics |Number of node in each state |
|
|
| +-----------------------------------------+
|
|
| |Total number of namespaces |
|
|
| +-----------------------------------------+
|
|
| |Total number of PODs per cluster,node,ns |
|
|
| +-----------------------------------------+
|
|
| |Total of number of services |
|
|
| +-----------------------------------------+
|
|
| |Endpoints in each service |
|
|
| +-----------------------------------------+
|
|
| |Number of API service instances |
|
|
| +-----------------------------------------+
|
|
| |Number of controller instances |
|
|
| +-----------------------------------------+
|
|
| |Number of scheduler instances |
|
|
| +-----------------------------------------+
|
|
| |Cluster resources, scheduler view |
|
|
+------------------------+-----------------------------------------+
|
|
|K8S API log analysis |Number of responses (per each HTTP code) |
|
|
| +-----------------------------------------+
|
|
| |Response Time |
|
|
+------------------------+-----------------------------------------+
|
|
|
|
For last two metrics we should utilize log collector to store and parse all
|
|
log records within K8S environments.
|
|
|
|
Openstack Layer
|
|
-----------------
|
|
|
|
CCP stands for "Containerized Control Plane". CCP aims to build, run and manage
|
|
production-ready OpenStack containers on top of Kubernetes cluster.
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`Fuel-CCP`_ |8570d0e0e512bd16f8449f0a10b1e3900fd09b2d |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
|
|
CCP configuration
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
CCP was deployed on top of 200 nodes K8S cluster in the following configuration:
|
|
|
|
.. code-block:: yaml
|
|
|
|
node[1-3]: Kubernetes
|
|
node([4-6])$: # 4-6
|
|
roles:
|
|
- controller
|
|
- openvswitch
|
|
node[7-9]$: # 7-9
|
|
roles:
|
|
- rabbitmq
|
|
node10$: # 10
|
|
roles:
|
|
- galera
|
|
node11$: # 11
|
|
roles:
|
|
- heat
|
|
node(1[2-9])$: # 12-19
|
|
roles:
|
|
- compute
|
|
- openvswitch
|
|
node[2-9][0-9]$: # 20-99
|
|
roles:
|
|
- compute
|
|
- openvswitch
|
|
node(1[0-9][0-9])$: # 100-199
|
|
roles:
|
|
- compute
|
|
- openvswitch
|
|
node200$:
|
|
roles:
|
|
- backup
|
|
|
|
|
|
CCP Openstack services list ( `versions.yaml`_ ):
|
|
|
|
|
|
.. code-block:: yaml
|
|
|
|
openstack/cinder:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/cinder.git
|
|
openstack/glance:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/glance.git
|
|
openstack/heat:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/heat.git
|
|
openstack/horizon:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/horizon.git
|
|
openstack/keystone:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/keystone.git
|
|
openstack/neutron:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/neutron.git
|
|
openstack/nova:
|
|
git_ref: stable/newton
|
|
git_url: https://github.com/openstack/nova.git
|
|
openstack/requirements:
|
|
git_ref: stable/newton
|
|
git_url: https://git.openstack.org/openstack/requirements.git
|
|
openstack/sahara-dashboard:
|
|
git_ref: stable/newton
|
|
git_url: https://git.openstack.org/openstack/sahara-dashboard.git
|
|
|
|
|
|
`K8S Ingress Resources`_ rules were enabled during CCP deployment to expose Openstack services
|
|
endpoints to external routable network.
|
|
|
|
|
|
See CCP deployment script and configuration files in the
|
|
`CCP deployment and configuration files`_ section.
|
|
|
|
Required Openstack-related metrics
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
At this layer we should get openstack environment metrics,
|
|
API and resources utilization metrics.
|
|
|
|
.. table:: Versions of CCP-related software
|
|
|
|
+------------------------+-----------------------------------------+
|
|
|Openstack metrics |Total number of controller nodes |
|
|
| +-----------------------------------------+
|
|
| |Total number of services |
|
|
| |-----------------------------------------+
|
|
| |Total number of compute nodes |
|
|
| +-----------------------------------------+
|
|
| |Total number of nodes |
|
|
| +-----------------------------------------+
|
|
| |Total number of VMs |
|
|
| +-----------------------------------------+
|
|
| |Number of VMs per tenant, per node |
|
|
| +-----------------------------------------+
|
|
| |Resource utilization per project,service |
|
|
| +-----------------------------------------+
|
|
| |Total number of tenants |
|
|
| +-----------------------------------------+
|
|
| |API request time |
|
|
| +-----------------------------------------+
|
|
| |Mean time to spawn VM |
|
|
+------------------------+-----------------------------------------+
|
|
|
|
Implementation
|
|
==============
|
|
|
|
This part of documentation describes Monitoring System implementation.
|
|
Here is software list that we chose to realize all required tasks:
|
|
|
|
.. table::
|
|
|
|
+-----------------------------------------+-----------------------------------------+
|
|
|Monitoring Node Server Side |Monitored Node Client Side |
|
|
+--------------------+--------------------+--------------------+--------------------+
|
|
|Metrics server |Log storage |Metrics agent |Log collector |
|
|
| | | | |
|
|
+--------------------+--------------------+--------------------+--------------------+
|
|
| `Prometheus`_ \+ | `ElasticSearch`_ |`Telegraf`_ | `Heka`_ |
|
|
| `Grafana`_ | \+ `Kibana`_ | | |
|
|
+--------------------+--------------------+--------------------+--------------------+
|
|
|
|
Server Side Software
|
|
---------------------
|
|
|
|
Prometheus
|
|
^^^^^^^^^^
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`Prometheus GitHub`_|7e369b9318a4d5d97a004586a99f10fa51a46b26 |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
Due to high load rate we faced an issue with Prometheus performance at metrics count up to 15 millions.
|
|
We split Prometheus setup in 2 standalone nodes. First node - to poll API metrics from K8S-related services
|
|
that natively available at `/metrics` uri and exposed by K8S API and ETCD API by default.
|
|
Second node - to store all other metrics that should be collected and calculated locally on environment
|
|
servers via Telegraf.
|
|
|
|
Prometheus nodes deployments scripts and configuration files could be found at `Prometheus deployment and configuration files`_ section
|
|
|
|
Grafana
|
|
^^^^^^^
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`Grafana`_ |v4.0.1 |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
Grafana was used as a metrics visualizer with several dashboards for each metrics group.
|
|
Separate individual dashboards were built for each group of metrics:
|
|
|
|
- System nodes metrics
|
|
- Kubernetes metrics
|
|
- ETCD metrics
|
|
- Openstack metrics
|
|
|
|
You can find their setting at `Grafana dashboards configuration`_
|
|
|
|
Grafana server deployment script:
|
|
|
|
.. code-block:: bash
|
|
|
|
#!/bin/bash
|
|
ansible-playbook -i ./hosts ./deploy-graf-prom.yaml --tags "grafana"
|
|
|
|
It uses the same yaml configuration file `deploy-graf-prom.yaml`_ from `Prometheus deployment and configuration files`_ section.
|
|
|
|
ElasticSearch
|
|
^^^^^^^^^^^^^
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`ElasticSearch`_ |2.4.2 |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
ElasticSearch is well-known proven log storage and we used it as a standalone
|
|
node for collecting Kubernetes API logs and all other logs from containers across environment.
|
|
For appropriate performance at 200 nodes lab we increased `ES_HEAP_SIZE` from default 1G to 10G
|
|
in /etc/default/elasticsearch configuration file.
|
|
|
|
Elastic search and Kibana dashboard were installed with
|
|
`deploy_elasticsearch_kibana.sh`_ deployment script.
|
|
|
|
Kibana
|
|
^^^^^^
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`Kibana`_ |4.5.4 |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
We used Kibana as a main visualization tool for Elastic Search. We were able to create chart
|
|
graphs based on K8S API logs analysis. Kibana was installed on a single separate node
|
|
with a single dashboard representing K8S API Response time graph.
|
|
|
|
Dashboard settings:
|
|
|
|
:download:`Kibana_dashboard.json <configs/dashboards/Kibana_dashboard.json>`
|
|
|
|
Client side Software
|
|
--------------------
|
|
|
|
Telegraf
|
|
^^^^^^^^
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`Telegraf`_ |v1.0.0-beta2-235-gbc14ac5 |
|
|
| |git: openstack_stats |
|
|
| |bc14ac5b9475a59504b463ad8f82ed810feed3ec |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
Telegraf was chosen as client-side metrics agent. It provides multiple ways to poll and calculate from variety of
|
|
different sources. With regard to its plugin-driven nature, it takes data from different inputs and
|
|
exposes calculated metrics in Prometheus format. We used forked version of Telegraf with custom patches to
|
|
be able to utilize custom Openstack-input plugin:
|
|
|
|
- `GitHub Telegraf Fork`_
|
|
- `Go SDK for OpenStack`_
|
|
|
|
Following automation scripts and configuration files were used to start Telegraf agent
|
|
across environment nodes.
|
|
|
|
`Telegraf deployment and configuration files`_
|
|
|
|
Below you can see which plugins were used to obtain metrics.
|
|
|
|
Standart Plugins
|
|
""""""""""""""""
|
|
|
|
.. code:: bash
|
|
|
|
inputs.cpu CPU
|
|
inputs.disk
|
|
inputs.diskio
|
|
inputs.kernel
|
|
inputs.mem
|
|
inputs.processes
|
|
inputs.swap
|
|
inputs.system
|
|
inputs.kernel_vmstat
|
|
inputs.net
|
|
inputs.netstat
|
|
inputs.exec
|
|
|
|
Openstack input plugin
|
|
""""""""""""""""""""""
|
|
`inputs.openstack` custom plugin was used to gather the most of required Openstack-related metrics.
|
|
|
|
settings:
|
|
|
|
.. code:: bash
|
|
|
|
interval = '40s'
|
|
identity_endpoint = "http://keystone.ccp.svc.cluster.local:5000/v3"
|
|
domain = "default"
|
|
project = "admin"
|
|
username = "admin"
|
|
password = "password"
|
|
|
|
|
|
`System.exec` plugin
|
|
""""""""""""""""""""
|
|
`system.exec` plugin was used to trigger scripts to poll
|
|
and calculate all non-standard metrics.
|
|
|
|
common settings:
|
|
|
|
.. code:: bash
|
|
|
|
interval = "15s"
|
|
timeout = "30s"
|
|
data_format = "influx"
|
|
|
|
commands:
|
|
|
|
.. code:: bash
|
|
|
|
"/opt/telegraf/bin/list_openstack_processes.sh"
|
|
"/opt/telegraf/bin/per_process_cpu_usage.sh"
|
|
"/opt/telegraf/bin/numa_stat_per_pid.sh"
|
|
"/opt/telegraf/bin/iostat_per_device.sh"
|
|
"/opt/telegraf/bin/memory_bandwidth.sh"
|
|
"/opt/telegraf/bin/network_tcp_queue.sh"
|
|
"/opt/telegraf/bin/etcd_get_metrics.sh"
|
|
"/opt/telegraf/bin/k8s_get_metrics.sh"
|
|
"/opt/telegraf/bin/vmtime.sh"
|
|
"/opt/telegraf/bin/osapitime.sh"
|
|
|
|
You can see full Telegraf configuration file and its custom input scripts in the
|
|
section `Telegraf deployment and configuration files`_.
|
|
|
|
Heka
|
|
^^^^
|
|
|
|
.. table::
|
|
|
|
+--------------------+-----------------------------------------+
|
|
|Software |Version |
|
|
+--------------------+-----------------------------------------+
|
|
|`Heka`_ |0.10.0 |
|
|
+--------------------+-----------------------------------------+
|
|
|
|
We chose Heka as log collecting agent for its wide variety of inputs
|
|
(possibility to feed data from Docker socket), filters (custom shorthand SandBox filters in LUA language)
|
|
and possibility to encode data for ElasticSearch.
|
|
|
|
With Heka agent started across environment servers we were able to send containers' logs to ElasticSearch
|
|
server. With custom LUA filter we extracted K8S API data and convert it in appropriate format to
|
|
visualize API timing counters (Average Response Time).
|
|
|
|
Heka deployment scripts and configuration file with LUA custom filter are in
|
|
`Heka deployment and configuration`_ section.
|
|
|
|
Applications
|
|
============
|
|
|
|
Kargo deployment script
|
|
-----------------------
|
|
|
|
deploy_k8s_using_kargo.sh
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/deploy_k8s_using_kargo.sh
|
|
:language: bash
|
|
|
|
CCP deployment and configuration files
|
|
---------------------------------------
|
|
|
|
deploy-ccp.sh
|
|
^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/ccp/deploy-ccp.sh
|
|
:language: bash
|
|
|
|
ccp.yaml
|
|
^^^^^^^^
|
|
|
|
.. literalinclude:: configs/ccp/ccp.yaml
|
|
:language: yaml
|
|
|
|
configs.yaml
|
|
^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/ccp/configs.yaml
|
|
:language: yaml
|
|
|
|
topology.yaml
|
|
^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/ccp/topology.yaml
|
|
:language: yaml
|
|
|
|
repos.yaml
|
|
^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/ccp/repos.yaml
|
|
:language: yaml
|
|
|
|
versions.yaml
|
|
^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/ccp/versions.yaml
|
|
:language: yaml
|
|
|
|
Prometheus deployment and configuration files
|
|
---------------------------------------------
|
|
|
|
Deployment scripts
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
deploy_prometheus.sh
|
|
""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/deploy_prometheus.sh
|
|
:language: bash
|
|
|
|
deploy-graf-prom.yaml
|
|
"""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/deploy-graf-prom.yaml
|
|
:language: yaml
|
|
|
|
docker_prometheus.yaml
|
|
""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/docker_prometheus.yaml
|
|
:language: yaml
|
|
|
|
deploy_etcd_collect.sh
|
|
""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/deploy_etcd_collect.sh
|
|
:language: bash
|
|
|
|
Configuration files
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
prometheus-kuber.yml.j2
|
|
"""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/prometheus/prometheus-kuber.yml.j2
|
|
:language: bash
|
|
|
|
prometheus-system.yml.j2
|
|
""""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/prometheus/prometheus-system.yml.j2
|
|
:language: bash
|
|
|
|
targets.yml.j2
|
|
""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/prometheus/targets.yml.j2
|
|
:language: bash
|
|
|
|
Grafana dashboards configuration
|
|
--------------------------------
|
|
|
|
:download:`Systems_nodes_statistics.json <configs/dashboards/Systems_nodes_statistics.json>`
|
|
|
|
:download:`Kubernetes_statistics.json <configs/dashboards/Kubernetes_statistics.json>`
|
|
|
|
:download:`ETCD.json <configs/dashboards/ETCD.json>`
|
|
|
|
:download:`OpenStack.json <configs/dashboards/OpenStack.json>`
|
|
|
|
ElasticSearch deployment script
|
|
-------------------------------
|
|
|
|
deploy_elasticsearch_kibana.sh
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/elasticsearch-heka/deploy_elasticsearch_kibana.sh
|
|
:language: bash
|
|
|
|
Telegraf deployment and configuration files
|
|
-------------------------------------------
|
|
|
|
deploy_telegraf.sh
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/deploy_telegraf.sh
|
|
:language: bash
|
|
|
|
deploy-telegraf.yaml
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/deploy-telegraf.yaml
|
|
:language: yaml
|
|
|
|
Telegraf system
|
|
^^^^^^^^^^^^^^^
|
|
|
|
telegraf-sys.conf
|
|
"""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/telegraf-sys.conf
|
|
:language: bash
|
|
|
|
Telegraf openstack
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
telegraf-openstack.conf.j2
|
|
""""""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/telegraf-openstack.conf.j2
|
|
:language: bash
|
|
|
|
Telegraf inputs scripts
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
list_openstack_processes.sh
|
|
"""""""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/list_openstack_processes.sh
|
|
:language: bash
|
|
|
|
per_process_cpu_usage.sh
|
|
""""""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/per_process_cpu_usage.sh
|
|
:language: bash
|
|
|
|
numa_stat_per_pid.sh
|
|
""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/numa_stat_per_pid.sh
|
|
:language: bash
|
|
|
|
iostat_per_device.sh
|
|
""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/iostat_per_device.sh
|
|
:language: bash
|
|
|
|
memory_bandwidth.sh
|
|
"""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/memory_bandwidth.sh
|
|
:language: bash
|
|
|
|
network_tcp_queue.sh
|
|
""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/network_tcp_queue.sh
|
|
:language: bash
|
|
|
|
etcd_get_metrics.sh
|
|
"""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/etcd_get_metrics.sh
|
|
:language: bash
|
|
|
|
k8s_get_metrics.sh
|
|
""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/k8s_get_metrics.sh
|
|
:language: bash
|
|
|
|
vmtime.sh
|
|
"""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/vmtime.sh
|
|
:language: bash
|
|
|
|
osapitime.sh
|
|
""""""""""""
|
|
|
|
.. literalinclude:: configs/prometheus-grafana-telegraf/telegraf/scripts/osapitime.sh
|
|
:language: bash
|
|
|
|
Heka deployment and configuration
|
|
---------------------------------
|
|
|
|
Deployment
|
|
^^^^^^^^^^
|
|
|
|
deploy_heka.sh
|
|
""""""""""""""
|
|
|
|
.. literalinclude:: configs/elasticsearch-heka/deploy_heka.sh
|
|
:language: bash
|
|
|
|
deploy-heka.yaml
|
|
""""""""""""""""
|
|
|
|
.. literalinclude:: configs/elasticsearch-heka/deploy-heka.yaml
|
|
:language: yaml
|
|
|
|
Configuration
|
|
^^^^^^^^^^^^^
|
|
|
|
00-hekad.toml.j2
|
|
""""""""""""""""
|
|
|
|
.. literalinclude:: configs/elasticsearch-heka/heka/00-hekad.toml.j2
|
|
:language: bash
|
|
|
|
kubeapi_to_int.lua.j2
|
|
"""""""""""""""""""""
|
|
|
|
.. literalinclude:: configs/elasticsearch-heka/heka/kubeapi_to_int.lua.j2
|
|
:language: bash
|
|
|
|
|
|
.. references:
|
|
|
|
.. _Fuel-CCP-Installer: https://github.com/openstack/fuel-ccp-installer
|
|
.. _Kargo: https://github.com/kubernetes-incubator/kargo.git
|
|
.. _Fuel-CCP: https://github.com/openstack/fuel-ccp
|
|
.. _Prometheus: https://prometheus.io/
|
|
.. _Prometheus GitHub: https://github.com/prometheus/prometheus
|
|
.. _Grafana: http://grafana.org/
|
|
.. _ElasticSearch: https://www.elastic.co/products/elasticsearch
|
|
.. _Kibana: https://www.elastic.co/products/kibana
|
|
.. _Telegraf: https://www.influxdata.com/time-series-platform/telegraf/
|
|
.. _GitHub Telegraf Fork: https://github.com/spjmurray/telegraf/tree/openstack_stats/plugins/inputs/openstack
|
|
.. _Go SDK for OpenStack: https://github.com/rackspace/gophercloud/
|
|
.. _Heka: https://hekad.readthedocs.io/en/v0.10.0/
|
|
.. _K8S Ingress Resources: http://kubernetes.io/docs/user-guide/ingress/
|
|
|
|
|