Administration

Logging

Each system component logs its own activity via Syslog protocol. This enables to create an auditory track of all events having been executed in the system.

The system also makes it possible to direct logging into one of the pipelines and thus ensure one´s own installation monitoring.

Monitoring

Internal control mechanisms of the BitSwan product monitor the whole system and thus provide the user interface with important information about the state and correctness of real-time data processing. This set of functionalities is essential for debugging of real-time data processing, which requires the whole system to run precisely.

Monitoring BSPump

BitSwan implicitly provides the following main metrics for each process (i.e. pipeline):

  • ● Throughput: measures a so called IOPS (number of input / output operations per second) and indicates the number of events incoming and outgoing from given pipeline
  • ● Duty cycle: displays the ratio of data processing in a pump as compared to waiting for their coming
  • ● Time drift: provides information about event processing delay as compared to real-time event occurrence
  • ● Error ratio: provides information about the number of damaged data in a pipeline
  • ● Warning ratio: provides information about the number of potential error data in a pipeline

Monitoring pipeline in Grafana

Known tools - e.g. Grafana or Zabbix - can further process outputs from the monitoring in the form of logs and metrics for a more specific performance analysis.

Data in Kibana

Přehled výkonosti v Grafaně

Moreover, BitSwan can provide performance data to standard monitoring tools like Zabbix, HP Open View etc.

LogMan.io

BitSwan is able to dispatch its logs and metrics (i.e. telemetrics) via the ITGuard product produced by theTeskaLabs company and thus enable a detailed insight into processing, analysis and storage of data by means of Grafana and Kibana tools.

It suffices to configure the BitSwan product correctly with a login username and password.

[logman.io]
url=amqps://{username}:{password}@logman-bridge:5477/{virtualhost}
username={YOUR_USERNAME}
password={YOUR_PASSWORD}
virtualhost={YOUR_VHOST}

InfluxDB

By means of configuration BitSwan makes it possible to dispatch the above mentioned metrics into InfluxDB database, from which they can be loaded and visualized in Grafana tool.

[asab:metrics:influxdb]
url={YOUR_INFLUXDB_URL}
db={YOUR_INFLUXDB_DATABASE}
username={YOUR_INFLUXDB_USERNAME}
password={YOUR_INFLUXDB_PASSWORD}

Grafana

Grafana is a tool which enables to create graphs and other visualisations over the BitSwan metrics. Metrics stored in InfluxDB database can be the input.

Performance in Grafana

BSPUMP Profiling

This document describes the new bspump profiling capabilities, as well as the involved metrics and intended usage. It is intended as a complement to the BSPUMP Profiling Grafana template, which is enclosed with this documentation.

Exporting the dashboard to any other desired Grafana instance include the following steps:

  1. Download the .json template from the link provided above (and proceed to number 4.)
  2. Choose “Share dashboard” in the upper right section of the Grafana GUI
  3. Click “Export” and then “Save to file” Export
  4. Navigate to the desired Grafana instance and choose “Create → Import → Upload .json file → Load” Import
  5. Now you need to choose a proper datasource containing your bspump.profiling metrics
    (Generally an InfluxDB → bspump.pipeline.profiler)
  6. After selecting proper database, update your variables (these can be found under Settings → Variables)
    You may need to reenter the definition inside of the variable definition. Variables

Metrics

  1. The basic metric we are looking at is duration. It represents duration the process has been running in the last 10 seconds
    (the application event tick)
  2. This is collected for every component in a pipeline (sources, processors, generators, sinks…)

Usage

  1. In general, profiling metrics help us identify weak points in our pipeline
  2. Using the pipeline view, we are able to see which components of the particular pipeline take more time than they should (or than we expect/want them to) Taking long
  3. We are also able to see pipeline behavior following outages and restarts Outage/restart
  4. Changes of behavior
    Behavior change

Evaluation

It is advised, when evaluationg the diagrams, especially when comparing two graphs or even two components in one, always bear in mind the scale of the axes (especially y)