VO-Centric Ganglia
-- Installation using Pacman --

0. Before starting

       Installation Instructions are here.


1. What, Where, From, Requirements, Licensing, Sensors, etc

       Please check the standard Installation Intruction page located here.


2. Pacman-based Installation
    • Before starting the instalation of Ganglia components, a few words about ...
      • ... system requirements:
        • The entire software was developed for Linux RedHat 6.2, but it should work without changes under many other Linux distributions;
        • Thus, a working gcc compiler, a Perl interpreter (5.6.0 or better preferred), common UNIX tools (grep, sed, wc, ...) are required;
      • ... the account under which the instalation process runs:
        • If  it is a regular account, everything is installed in a subdirectory called ganglia (under the current directory); additionally, the setup script will make changes to the configuration files to allow Ganglia to start ONLY as the user executing the installation process (there is a script that should be placed under crontab);
        • If the instalation is done as root, everything is installed in /usr/local/ganglia; additionally, the setup process will make changes to the configuration files to run Ganglia daemons as user nobody (there are two scripts that must be/are placed in the startup dir)
        • I (Catalin) strongly recommend the former approach (to avoid run-away commands or any other unexpected problems on your system);
      • ... what will happen to the cluster (and a big picture) :
        • after setup and starting the Ganglia daemons, on each compute node will/should run an instance of the Ganglia Sensor that collects local statistics about node usages; on each head node (one per cluster) will/should run both the Ganglia Sensor and the Ganglia Meta-Daemon
        • monitoring information is broadcasted among all nodes inside the cluster (multicast) by means of the local sensors; each such sensor stores the latest snapshot of data in-memory and is able to provide this snapshot when queried (on TCP port 8649 in the default configuration);
        • the Meta-Daemon stores a history of all numeric information in RRD files (RRAs for 1:240*15, 24:240*15, ...) and can provide the latest snapshot of all collected information in XML (monitoring, head/gatekeeper usage, other useful information) or history views as gifs (on TCP port 8653 in the default configuration);
        • thus:
          • 8649 is not required to be open for outside access in your firewall; even more, I recommend to be blocked, or, at least, gmond be configured to provide this information to a selected list of IPs;
          • 8653 should be open in your firewall for external queries; if this is not possible, there is yet the second choice of publishing your information to other hosts instead of letting them pulling the monitoring data (be sure to notify administrators of the target hosts about your intents). For the specific case of this demo, the maintainer of the VO-Centric Ganglia on grid02.uchicago.edu [128.135.102.68] (the contact node for all US sites) and grid03.uchicago.edu  [128.135.152.126] (the contact node for all EU points) is Catalin Dumitrescu;
    • Install the VO-Centric Ganglia Sensor on each compute (& head) node:
      • pacman -get http://people.cs.uchicago.edu/~cldumitr/pacman:BLGmond
    • Answer the questions printed by the setup script:
      • Cluster Name: <ENTER A STRING THAT REPRESENTS THE BEST YOUR CLUSTER>;
      • Trusted Hosts: <ENTER IPs FOR ALL THE NODES THAT CAN CONNECT> (at least the head node's ip);
    • Install the VO-Centric Ganglia Meta-Daemon on the head node:
      • pacman -get http://people.cs.uchicago.edu/~cldumitr/pacman:VOGanglia
    • Before running this command, be sure to:
      • install gmond;
      • know where VDT or (Globus and Condor), and, optionally, PBS are installed;
      • setup {VDT_LOCATION or (GLOBUS_LOCATION and CONDOR_LOCATION)}, PBS_LOCATION, and GMOND_LOCATION in your shell if pacman does not already do this in your case (install scripts will complain, but the process will go through);
    • Answer the questions asked by the setup script:
      • Exported Name: <ENTER A STRING THAT REPRESENTS THE BEST YOUR CLUSTER>;
      • MDS Usage: <LEAVE EMPTY, AS LONG AS IN THIS VERSION MDS INTERFACE IS NOT INTENDED TO BE USED>;
      • Trusted Hosts: <ENTER A LIST OF HOSTS THAT SHOULD ACCESS DATA. DON'T FORGET grid02.uchicago.edu [IP: 128.135.102.68] AND/OR grid03.uchicago.edu [IP: 128.135.152.126]. FOLLOW THE PROVIDED EXAMPLE>;
      • Source Hosts: <ENTER A LIST OF HOSTS FROM WHERE DATA SHOULD BE RETRIEVED (AT LEAST A HOST FROM YOUR CLUSTER)>;
      • Target Hosts: <ENTER A LIST OF HOSTS WHERE DATA SHOULD BE SUBMITTED (DON'T FORGET grid02.uchicago.edu [IP: 128.135.102.68] AND/OR grid03.uchicago.edu [IP: 128.135.152.126])>;

  • Cluster with standard Ganglia installed
    • You should skip step one, the step of installing the VO-Centric Ganglia Sensor. The VO-Centric Ganglia Meta-Daemon should work fine with the standard sensors (my changes are minor for it);
    • Install the VO-Centric Ganglia Meta-Daemon on the head node (to use previous installed host sensors):
      • pacman -get http://people.cs.uchicago.edu/~cldumitr/pacman:VOGanglia
    • Go through the same process as before (New/Fresh Cluster Installation); 
    • After completing the setup process, be sure to change manually the following files/lines:
      • gmond.conf files on at least one node (head node) to allow connections from the Ganglia Meta-Daemon;
      • etc/gmmetad2/gmetad_sensors => line begin Cluster @InfoCluster@: Replace @InfoClusters@ with the name of your cluster as specified in the gmond.conf file (uncomment the entire paragraph in case it is still commented '#');
    • Install the VO-Centric Ganglia Meta-Daemon on the head node (to use standard Ganglia Meta-Daemon):
      • The instalation process is similar; the difference consists in the values entered for the (IP address, TCP port) pairs of the source hosts (during the setup process). Instead of answering LOCAL 127.0.0.1 8649 (or whatever IP address the head node has), use instead LOCAL 127.0.0.1 8651 (the TCP port on which the standard Ganglia Meta-Daemon listens);
    • "I don't want to install anything" approach:
      • Your site / cluster can still participate with limited monitoring information. The site will not be able to provide VO-specific usage information and will not be able to answer queries sent dirrectly to it (as VO-Centric Ganglia does).  Also, it is not able to pull data to the other  hosts, it can just provide information on-demand;

  • Cluster with an ealier VO-Centric Ganglia installed
    • Not yet supported, updates should be made available soon;
    • A simple work-around: download gmmetad-2.1.tar.gz, type autoconf && ./configure (with the adequate parameters) and copy the gmmetad2 and gmmetad2-sensors directories to the <deployment directory>/sbin directory (should work);

  • The VO-Centric Ganglia Web Interface
    • This package is optional for a site. The instalation requires a working Apache web server (mod_perl module is recommened);
      • pacman -get http://people.cs.uchicago.edu/~cldumitr/pacman:VOGanglia-webfrontend 
    • Changes to the apache configuration file are not done automatically, they are described bellow:
      • find and open for editing: httpd.conf (as www user or root);
      • A. add the folowing lines (if mod perl enabled):

  # HTML directory setup for Nagios (TM) images and styles
  Alias /nagios "GANGLIA_DIR/webfrontend/nagios1/share"
  <Directory "GANGLIA_DIR/webfrontend/nagios1/share">
     Options Indexes FollowSymLinks MultiViews IncludesNoExec
     AddOutputFilter Includes html
     AllowOverride None
     Order allow,deny
     Allow from all
  </Directory>

 # CGI dir configuration for the Nagios (TM) imported web-interface
  ScriptAlias /nagios/cgi-bin "GANGLIA_DIR/webfrontend/nagios1/sbin"
  <Directory GANGLIA_DIR/webfrontend/nagios1/sbin>
     AllowOverride AuthConfig
     order allow,deny
     allow from all
     Options ExecCGI
  </Directory>

  # Main CGI dir configuration for a server with mod perl enabled
  Alias  /sstat2 "GANGLIA_DIR/webfrontend/gmmetad2"
  PerlModule ModPerl::Registry
  <Directory GANGLIA_DIR/webfrontend/gmmetad2>
      SetHandler perl-script
      PerlHandler ModPerl::Registry
      Options +ExecCGI
      PerlSendHeader On
      allow from all
  </Directory>

  • B. or add the following lines (if mod perl is not enbaled on your host):
  # HTML directory setup for Nagios (TM) images and styles
  Alias /nagios "GANGLIA_DIR/webfrontend/nagios1/share"
  <Directory "GANGLIA_DIR/webfrontend/nagios1/share">
     Options Indexes FollowSymLinks MultiViews IncludesNoExec
     AddOutputFilter Includes html
     AllowOverride None
     Order allow,deny
     Allow from all
  </Directory>

  # CGI dir configuration for the Nagios (TM) imported web-interace (same as before)
  ScriptAlias /nagios/cgi-bin "GANGLIA_DIR/webfrontend/nagios1/sbin"
  <Directory GANGLIA_DIR/webfrontend/nagios1/sbin>
     AllowOverride AuthConfig
     order allow,deny
     allow from all
     Options ExecCGI
  </Directory>

  # Main CGI dir configuration for a server without mod perl enabled
  ScriptAlias  /sstat2 "GANGLIA_DIR/webfrontend/gmmetad2"
  <Directory GANGLIA_DIR/webfrontend/gmmetad2>
     AllowOverride AuthConfig
     order allow,deny
     allow from all
     Options ExecCGI
  </Directory>


  • Note: if the Nagios(TM) interface database is not enabled in the Ganglia main configuration file, then only the last tag should be added to the httpd.conf file;
  • don't forget to replace all previous "GANGLIA_DIR" occurences with your particular target directory;
  • restart the web server and try to access the http://server-addr.xxx:port/sstat2/;

  • Nagios(TM) Interface
    • It consists of only one small C program that must be compiled with the standard Nagios(TM) 1.04b distribution. Its main function is to parse Nagios(TM) server data files, to build a XML string representing the latest snapshot of the monitoring information and to provide this information on the TCP port 8653 when remotelly queried. It is a test version with limited functionality right now;


3. A few more words about ...
  • Start/Stop-ing Sensors and Meta-Daemons:
    • there are two methods: using startup provided scripts that are copied automatically to /etc/rc.d/init.d and linked under rc3.d and rc5.d when make install-startup is issued as rootand/or  the gmmetad2.crontab script that should be installed in crontab and which checks if gmond/gmmetad2 processes are running.
  • SSL-enabled communication:
    • The latest distribution of the VO-Centric Ganglia Meta-Daemon includes specific code for using IO::Socket::SSL connections in addition to standard IO::Socket. The drawback is that all nodes in the tree must have this option enabled (for the 2.1 release); 
    • To enable this option, place use_ssl={0x00 | 0x01 | 0x02} in the main configuration file, the adequate certificates in the certs directory and ... that's all the info I provide about this part
  • Globus, Condor, PBS retrieving information:
    • There are several scripts that interact with the local head/gatekeeper node. Most of them are placed in the sbin/gmmetad2-sensors/ directory; they access (for example), gatekeeper log file, scheduler log files, Condor queue, PBS queue and stattistics files, local node information about running processes and so on. If one piece does not work correctly, some of information may  not be collected.

4. Open Issues
  • Perl 5.005 and XML::Expat (and possible other versions):
    • On some Linux variants, it seems like the instalation process for the Expat module places modules under some directories which are not in the standard Perl @INC. If you see a message like: "Cannot load XML::Parser ..." then you have to move manually several files to a visible location for Perl. I avoided this by copying files from GANGLIA_LOCATION/lib/perl5/site_perl/5.*/i*86-linux/ to GANGLIA_LOCATION/lib/perl, and, respectively, from GANGLIA_LOCATION/lib/perl5/site_perl/5.*/i*86-linux/auto/XML/Parser/Expat/* to GANGLIA_LOCATION/lib/perl.
  • Running multiple instances of VO-Centric Ganglia on the same node:
    • to come
  • Installing RRD on NFS mounted partitions:  
    • rrdtool complains when it is installed on a shared file partition. It uses a NFS-unaware lock mechanism => failure in updating databases. My solution was for a particular case for a CS@UofC Cluster to move the rrds files to /tmp. This can be easily done by changing the rrd_dir in etc/gmmetad2/gmetad_config file to point to something like /tmp/ganglia (please don't forget to create this directory first).

4. Related Links
  • Pacman 
  • VDT / Globus, Condor
  • PBS


Page created by:  Catalin Lucian Dumitrescu