SmartMonitoring — business logic monitoring in Odnoklassniki

The most critical aspect of business logic monitoring is to identify the incident's nature. Quite often it takes a significant amount of time and requires a highly qualified specialist, seeing that the main indicators of web site and services performance are presented in thousands of graphs, thus the search of the original source is challenging. So that's why the SmartMonitoring system has been created. It can find anomalies in portal's work and shows the connection between them, therefore making administrator's and developer's work easier. In case of an incident, such system helps to understand exactly which of a dozen of our services was the first one to fail, speeds up the "unraveling of the tangle" during the incident investigation and helps to quickly find the service that is to be blamed. We'll tell you how we've created a system like this, how it works and which difficulties we've faced.

