Premon Architecture


In our last post I introduced you to the basic characteristics of Premon, the Intelligent Monitoring solution. In this post let us try to look at what is the architecture of Premon which makes it possible for Premon to support the characteristics that we talked about. Let us look at the architecture and dissect the pieces one by one.

Premon - High Level Architecture

If you would notice, there is a series of numbers running from 1 to 6 at the bottom of the picture. Let us go into each of the layers sequentially

  1. Application / Infrastructure being monitored – This is the application or infrastructure that you would like to Intelligently monitor using Premon and other existing tools which feed data into Premon. The infrastructure could range anywhere from server nodes to network boxes, routers, databases, printers etc. The application would be any application which has the potential of sending a message across to either your existing monitoring tools like Nagios, Hyperic etc or directly to the Premon surveillance controller.
  2. Existing Monitoring Tools – This layer represents the existing monitoring tools that you already have in your enterprise and would like to leverage further too. As we discussed in the last post, Premon compliments and does not compete. Premon has plugins built for the most used existing enterprise products. So if you are using monitoring tools like HP openview, Hyperic, Nagios, Cricket etc then you can immediately start using Premon. Also, as we discussed, Premon has an API to build plugins for other tools that you are using or possibly integrate your custom tools by implementing the API.
  3. Surveillance Controller – This is the first point of contact for the Premon system with the external world. The Premon Surveillance Controller (PSC) is accepts inputs from various systems which are plugged into Premon. The PSC works on both the pull and push model to get inputs. You can configure PSC to pull events generated from monitoring tools from their agents or configure the monitoring tools to feed data into the PSC. You can also do a combination of the two approaches depending on the needs. Since Premon works on the pull based model too by siphoning relevant data directly from the agents/existing monitoring tools, it is non-invasive and does not require huge setting up needs.
  4. Event Bus – The Premon Event Bus (PEB) is the message passing mechanism between the events received by Premon and the consumers of these events namely the PML (Premon Machine Learning) and PCE (Premon Correlation Engine). The PEB gets a decorated set of events which are suitable to the Premon system from the Surveillance controller and it holds these events for the other parts of Premon to consume them.
  5. Machine Learning and Correlation Engine – This layer is the heart of the Premon system. All the complex logic and algorithms to do the relationship matching and learning about your enterprise infrastructure and applications happen here. Let us look at the two pieces one by one
    • Machine Learning – This is an interesting module of the Premon system where on the basis of events passing through the event bus, the system learns about the normal behavior of the enterprise. For example, if Premon is installed in the enterprise for 3 days then it would start learning about the flow of events for those 3 days and form an understanding on the basis of event flow. It would understand that there are more events generated between 0900 hours to 1200 hours EST on a weekday. Likewise the CPU usage for the nodes being monitored is between 45-65% during these hours. Between 1200 hours to 1400 hours the events fall down by around 35% (assuming people are leaving for lunch and not using the applications as much), then there is another peak between 1400-1500 hours and then the load goes down. All these learnings are maintained in the machine learning system and these form the normal behavior. Once a normal behavior is found, a complex event is automatically created by the machine learning module for the correlation engine to discover any anomaly.
      For example: The average messages generated during weekdays at the peak time is 300,000 but on a particular day only 130,000 messages are being generated. Now this is clearly a deviation from the routine and the complex event correlates information about all the statistics being monitored like CPU, RAM, Network speed etc to generate an exception. Once this exception is received by the operation team, they could dig out the root cause in a matter of minutes. Further as soon as an anomaly is detected, Machine Learning would create another rule for the correlation engine to generate event as soon as it matches the current state. So if we figured out that the router speed had dropped by 20% and CPU were at 80% in this situation then the next time when something of this sort happens then generate an event in advance so that corrective action can be taken instantly.
    • Correlation Engine – This is the place where CEP (Complex Event Processing) is done. The correlation engine has the rules to do relationship mapping. This component is powered with information to find relationship between real-time stream of events which are being passed to the engine and generate alerts on those events. The rules for the correlation engine are either self-generated by the Premon system on the basis of machine learning or they can be entered manually through the admin interface. User also has the flexibility of activating, deactivating, removing rules in real-time.
      The correlation engine has the capability to correlate event streams and intelligently predict the outcome and alert the relevant people.
  6. Output – The last layer represents the outputs generated by the Premon system. These range from real-time queries which can be executed on the Premon system to trend reports to adding information in the log file for detailed trending analysis. An important output of the Premon system is also the alerts which are generated to be sent to people or other systems. These alerts can be regular alerts or can be made intelligent alerts which take concrete actions. For example as a result of the triggering of an alert you might want to trigger a webservice which triggers another workflow and sends a template based email to the concerned people. You have the flexibility to implement the AlertAction interface and write the custom behavior that you would like the alert to perform.

If you are with me till now and reading this then you would have realized that the layers numbered 3, 4 and 5 constitute the Premon system. Layers 2 and 6 define the input and output respectively and the layer 1 is the valuable infrastructure and the applications that we plan to monitor intelligently.

If you are interested in trying out the alpha release of the Premon system for free and would like to be our beta site then please write to us at premon@inphina.com

About Vikas Hazrati

Vikas is the Founding Partner @ Knoldus which is a group of software industry veterans who have joined hands to add value to the art of software development. Knoldus does niche Reactive and Big Data product development on Scala, Spark and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). To know more, send a mail to hello@knoldus.com or visit www.knoldus.com
This entry was posted in Agile, Architecture, Java and tagged , , . Bookmark the permalink.

2 Responses to Premon Architecture

  1. Orlando Costa says:

    Hi Vikas
    About this Correlation Engine, is the based on classic production rules systems, using RETE algorithm? If so, are you using to process high volume of incoming events on that engine, does it show enough performance?
    Best Regards
    Orlando

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s