Wednesday, March 10, 2010

Re: [BLUG] Server monitoring tools

If the rest of you want to skip most of this, you might check out the
special note for everyone at the end of this e-mail.

On Wed, Mar 10, 2010 at 11:47:49PM GMT, Kirk Gleason [kgleason@gmail.com] said the following:
> All,
> Recently I went to a demo from a vendor about server / service
> monitoring. One of the presenters kept telling us how "sexy" it all
> was. I do have to admit that you can do some cool stuff with WMI and
> server monitoring. Of course once they were asked about providing the
> same stuff for non-Windows servers, the crowd was presented with the
> blank stare and babbling response that no sales person ever wants to
> give.

It doesn't matter how sexy it is if it doesn't send you an alert you
properly when a service is down. I don't care how pretty/not pretty
nagios looks, its not a public service.

> However this presentation got me thinking. What is out there? I am
> currently using a Nagios install for availability monitoring and Cacti
> for historical performance graphing. I do like what I can get from the
> NSclient++ and WMI monitoring on the windows machines, but the same
> type of functionality for linux always seem kludgy and clunky to me.
> Maybe it is me. I don't really have any specific example (I am at
> basketball practice for one of my kids as I type this), but it seems
> like I should be able to get more.

Nagios is a lot of configuration I'll admit, but I've made it much
easier for myself by writing a program that does an SNMP walk and TCP
port scan of the hosts and auto generates the configuration accordingly.
If you want a copy, I can post it to the list or put it somewhere.

Thank you for acknowledging that Nagios has one purpose and Cacti has
another. I've been doing performance and availability monitoring and
graphing for over 10 years now and have used a few solutions. And on
each of them there have always been people who want to combine the
graphing with availability notification. But they really are different
things, just like a database is different from a webserver. Sure, you
can use them together and when a service finally fails you can go back
through graphs and see if anything led up to that crash, but programs
that try to combine the two together, usually don't do either one good
enough.

> I've tried Groundwork Open Source and Zenoss in addition to Nagios
> with various combinations of Cacti and/or mrtg. I always seem to come
> back to Nagios, and I always seem to start immediately looking for
> something else.

Zenoss doesn't see all that bad, but I was already pretty familiar
with Nagios and Cacti. The other thing that worried me is that I may
start using it and then the company will start closing up parts of it or
make it difficult to add onto it without starting to go the pay route.
I don't mind paying for software, but I usually do my homework enough
that I shouldn't have to. And I don't mind doing the homework because it
pays off.

* SPECIAL NOTE *
Speaking of monitoring, Suso recently bought a couple of domains
(www.bloomingtonnetworks.com) for the purpose of displaying information
about the heath of Bloomington's area and regional networks. It is going
to be a portal so that the general public can go there and see what
might be down so that they can better determine what problem they might
be experiencing. Also, the idea is to get companies in the region to
care more about the quality of service of their networks. Mostly
targeting ISPs, web hosting providers, schools, etc.

Right now the website doesn't have anything publically, but I do have
a nagios installation setup that has already been monitoring some
networks. RIght now I plan on monitoring these networks.

Bloomington City Government
Indiana University
Smithville
Suso
Egix/Kiva
Comcast
AT&T
Verizon
Internet hosts
Some providers in Indy
Common backbone providers in Indy and Louisville
Popular Interet websites

With many of these networks, its a little more difficult to monitor
because a network's own website being down might not necessarily mean
that their internet connectivity is down, so I was going to write some
simple clients that people could run from cron or something and "check
in" with a central server. Then if 3+ of them are unavailable at once,
then its more obvious that there is a problem.

If this is something that you'd be interested in running, please let
me know. Also let me know if you'd be interested in helping out with
this project or if you have suggestions for networks/hosts to monitor.

Thanks,
Mark


--
Mark Krenz
Bloomington Linux Users Group
http://www.bloomingtonlinux.org/
_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

No comments: