Data analytics in the BitSwan product
BitSwan is designed for a fast and precise real-time data analysis, for the purpose of which it uses connection to data stream from various sources, whether it be Kafka, RabbitMQ, Syslog or increasing number of files in data storage.
Apart from the possibility of defining one´s own pipeline processors (e.g. designed for data processing, parsing and enriching of data) and metrics, the BSPump project provides already pre-defined possibilities of real-time analysis, which can be used in a concrete solution within the BitSwan product.
Stream analytics output can for instance be used for notification in case of exceeding set levels. This border value (outlier) exceeding - as part of anomaly detection – enables to search for individual metrics which differ from common metric behaviors in grouped data given. It poses an elegant mechanism which makes it possible to detect problems in the system automatically. Anomaly detection outputs are stored in ElasticSearch database within so called alarm indexes, where current and historical detections can be gone through and filtered as well.
Dimension is known for information connecting individual data events. It serves for correct classification and description of data so that they make sense as far as following analyses, visualisations and descriptions are concerned. There can for instance be time, geographical or technological dimensions.
An example of time dimension is for instance time of event occurrence in the system carried by the actual event or the length of phenomenon duration by which the event was created. As geographical dimensions can for instance serve: geographical coordinates searched from IP addresses, countries of origin etc. An example of a technological dimension can for instance be type of device on which the actual event originated.
Dimensions are used for correlation of individual events – e.g. a geographical dimensions enable to group events according to concrete cities, regions and countries. By means of session dimensions individual events can be assorted to concrete sessions and behaviours of given users can be tracked
Stream analyzer is a type of analyzer which is integral to pipeline and produces its own events in dependence on a data stream.
Time Drift Analyzer
Time Drift Analyzer is a specific type of analyzer focused on time difference analysis as regards event occurrence in stream in relation to current time, i.e. time of passage through stream analyzer. This analyzer describes the behaviour of the system when data stream in real time is delayed, which can serve as system fault identification or only as system behaviour reference which has to be taken into consideration during the following event processing.
Time Window Analyzer
Time Window Analyzer is a specific type of analyzer which evaluates incoming events in the context of a time window and dimensions observed and performs mathematical operation over them which is determined by the input function.
It captures events, arranges them according to time dimension (e.g. current time or time of event) as well as according to some other determined dimension, e.g. IP address. Over events in a defined time window it performs various mathematical operations with their fields - like addition of values, searching minimum and maximum values, addition of number of values, standard deviations and the like.
Over the value matrix prepared in the above mentioned way it subsequently performs other mathematical operations which constitute entry into the actual analysis - e.g. searching for dimensions different from the other ones (outliers), grouping dimensions according to values (classification), searching for time intervals with empty values (drop to zero) or specific values (triggers) and the like.
The output of Time Window Analyzer is a new event as the result of a mathematical operation applied to values captured in the time window given.
BitSwan also works with events which have only just come, nevertheless are already beyond the observed time window (late events) or events which have not occurred yet (early events). These undesirable states are evaluated and the user is provided with feedback for the following analysis.
Session Analyzer is a specific type of analyzer which evaluates incoming events in context of a specific value (e.g. Session ID) and performs pre-defined operations over them. As regards time, the validity of the session can be limited.