When websites or mission critical services go down they can sometimes make headline news. O2 were taken off the Conduce Christmas card list last week when their nationwide
mobile phone outage affected almost our entire team. At Conduce Software we have a number of measures in place to ensure our web applications are as resilient and robust as possible, but it never possible to be assured of 100% reliability. We've been using the
PRTG Network Monitor tool to help us keep an eye on our various web and server assets for a while. This does a great job of keeping us alerted, but it's quite a complex and technical tool that we wouldn't necessarily want to present to our customers.
The main driver to improve visibility of application status has been our new
etechlog application. This is a mission critical system for our customers with users potentially anywhere in the world accessing the system 24/7. To help assure users of application status in the event of a problem I designed a "Server Status" screen which we wanted to include in the management application dashboard.
This morning Mark completed the first phase of development of the monitoring application. Here's a
live view of the actual etechlog application status page.
We use the PRTG Network Monitor API which delivers the status of each device sensor that we have in place. Mark has developed a quick and simple web application that allows us to add some additional meta-data about those various sensors and deliver those to a simple web page which refreshes every 60 seconds. I wanted the status page to be as simple as possible where it was obvious if there was a problem and where a description is given as to what the consequences would be for a specific issue. The idea is that we can drop that status content into an iframe on to any web page as I have done here. For the etechlog we are in the process of adding this page to the dashboard of the management application. We've set up sensors for all of our other customer's hosted applications and have created an
Application Status page on our website so that they can view the status of their system at any time.
The next phase will see us show a status history and include additional status values such as server response times and database query results. Let me know what you think.
Related Posts:
Lists v2.0
Etechlog Sneak Peek
The New AircraftIT Columnist
Author:
Paul Saunders