Saturday, February 21, 2009

Hyperic First Impressions

As someone who is in charge of a large network of different applications and hosts, I spend a decent portion of the day wondering if our software is working correctly. Some of our software is off-the-shelf, but much of it is application-specific custom built software. To further complicate things, much of this software interacts with other software and services. Staying "aware" in this type of large heterogeneous environment is serious work and in the past we've developed custom apps to monitor other custom apps. Clearly there must be a better way.

BEEP!
Check out this unrelated picture I added to spice things up!

Enter Hyperic.

I learned about Hyperic through this pretty interesting round-up. For those of you who aren't familiar with it, Hyperic is an open source, java-based monitoring platform. My first thought was that Hyperic kind of looks like Cacti (a rolled up fancy version of MRTG). In fact, it's quite a bit different.

Agent Based

The first big difference you notice between a grapher (like Cacti) and Hyperic is that Hyperic has an agent application. This caused me some initial alarm. I worry about having to install an agent on a bunch of boxes in the field. I also worry about the resources consumed by the agent. Fortunately it's pretty simple to install (just a directory containing a JRE and the software). The agent is also really smart at monitoring it's own processor utilization, so you can keep tabs on how much work the it's actually doing.

My agent-related fears assuaged I started setting up a small Hyperic pilot test. Server installation was pretty easy on Linux. For the pilot I decided to use the built-in DBMS (Postgres) rather than setting up an out-board MySQL database. The server started up easily and self configured itself. After setting up a few credentialed users I started installing agents.

Understanding Hyperic

When your first agent connects to Hyperic you start to learn about the system. Hyperic is pretty opaque at first. A new user is immediately confronted with the notions of Platforms, Servers, Services and more. The key to wrapping your head around Hyperic is understanding that a physical server is not a "Server" in Hyperic. It's a "Platform". This is important. It's called a platform because it can run multiple servers (FTP, HTTP, SQL, etc). A server process can then host multiple services. For example: IIS can serve multiple web sites (vhosts). Understanding this hierarchy is key to understanding the way that Hyperic monitors your environment.

So having now added our first platform we start to get a feel for the power that Hyperic brings to the table. Hyperic begins auto-discovering servers on each platform it's installed on. So on my first box it discovered IIS, .Net 2.0 and MySQL. It immediately provided detailed monitoring capabilities. When I say detailed I don't mean: How many MySQL processes are runnng? I mean: How many rows are in table X in database B?

Hyperic immediately provides granular monitoring of installed services.

RAR! I am pile of servers! HEAR ME ROAR!
Check out this other unrelated picture I added to show some contrast to the first unrelated picture!

It doesn't stop with "installed" services either. The next week I added a new service on the machine. Within hours, Hyperic had discovered the new server and made it available for monitoring.

Auto-discovery is all well and good, but it doesn't help us monitor those oddball custom applications. How do we find out if our FTP Uploader application is behind on it's transfers? Well Hyperic provides us two easy mechanisms to do this. The first is at the SQL level: I can have Hyperic monitor the result of an SQL query via the database server's agent. The other option I have is to go to the FTP Uploader's platform and modify it's "inventory" manually. I can tell it about the input queue directory and have Hyperic monitor the number of files inside.

Alerting

Which brings us to alerting. Hyperic has pretty cool system for defining alerting criteria. Anyone who's managed a real time environment knows that false positives are a real problem. To prevent this Hyperic has threshold-based alerting. This allows you to say: Only alert me if the disk is full for more than 1 hour in a 12 hour period. This functionality prevents regular fluctuations from causing spurious alerts.

Once an alert is triggered Hyperic follows user-defined "escalation trees". An escalation tree is designed to gradually increase the loudness of your systems desperate cries for help. For example: suppose a process, for which I'm responsible, begins to choke. An escalation tree might send me an email. But since it's 2am on a Tuesday I'm hardly checking my email. This process is important though, so after waiting a predetermined time, Hyperic decides to send me a text message. Normally that's the kind of alert that I respond to, but on this particular Tuesday night I'm sleeping one off, so I forget to charge my iPhone. Hyperic decides that I'm clearly not the right man for the job and can thereafter begin alerting other people or group mailing lists. All of this is readily configureably and pretty cool stuff. It tracks all sorts of data under the hood as well (IE: who fixed what, when and how).

Open Source

Hyperic is available in two varieties: An open source version called Hyperic HQ, and a closed source version called Hyperic HQ Enterprise. The enterprise version of the software has a number of cool features that could make my life a bit easier, but when I spoke to a sales representative the price proved to be prohibitively expensive in my environment. They seem to charge a per-server-monitored subscription fee. Unfortunately, for the price they ballpark-quoted me, I could hire a full time engineer who's job is solely to manage the open source version of the software.

Don't get me wrong though, the enterprise version has some very cool functionality (like the ability for the server to automatically react to certain conditions and send commands). But unfortunately Hyperic seems to charge on an all-or-nothing pricing model, so if I wanted to use advanced features on a small subset of my hardware I'd end up having to create two parallel installations or pay full price for every server I monitor.

What's next

Well these are all my "first impressions" so over the next couple of months, as I get more involved using Hyperic I'm going to need to get a grip on how much monitoring is appropriate. I have hundreds of servers that could benefit from this type of monitoring but that might create a load situation on the Hyperic server. Furthermore, there is so much data that we can keep track of - some of it incredibly useful. And some of it is... er... not-so-much. I hope that someone is keeping track of the "Number of Deleted Recurring Appointments" in exchange. There is clearly a trade off between performance of the Hyperic server and awareness of the environment. This is true at the network level as well: What if a significant portion of my bandwidth turns out to be monitoring-related messages?

The other factor is that the more systems we begin to monitor using Hyperic, the more Hyperic itself becomes a critical element in our infrastructure. We begin to need to keep tabs on Hyperic itself. This also means that we need to be methodical in our setting up and deleting of alerts. I suspect this will require some level of automation.

To me, this is all very interesting and exciting. To everyone else, my apologies.

2 comments:

Anonymous said...

Infatuation casinos? explore this culmination [url=http://www.realcazinoz.com]casino[/url] commander and pick up up online casino games like slots, blackjack, roulette, baccarat and more at www.realcazinoz.com .
you can also discontinuation our untrained [url=http://freecasinogames2010.webs.com]casino[/url] cicerone at http://freecasinogames2010.webs.com and repress to verified spectacularly touched in the enormous cheese !
another diverse [url=http://www.ttittancasino.com]casino spiele[/url] redress is www.ttittancasino.com , as opposed to of german gamblers, lean during manumitted online casino bonus.

Anonymous said...

limit in compeer this gratis [url=http://www.casinoapart.com]casino[/url] perk at the greatest [url=http://www.casinoapart.com]online casino[/url] timely direction with 10's of unconventional [url=http://www.casinoapart.com]online casinos[/url]. beget a finished to pieces at [url=http://www.casinoapart.com/articles/play-roulette.html]roulette[/url], [url=http://www.casinoapart.com/articles/play-slots.html]slots[/url] and [url=http://www.casinoapart.com/articles/play-baccarat.html]baccarat[/url] at this [url=http://www.casinoapart.com/articles/no-deposit-casinos.html]no forbear casino[/url] , www.casinoapart.com
the finest [url=http://de.casinoapart.com]casino[/url] against UK, german and all as a remains the world. so in behalf of the choicest [url=http://es.casinoapart.com]casino en linea[/url] discontinuity us now.