iVDGL:policy-mon Module
-- Installation using Pacman --

0. Before starting
  • Installation Instructions are here.
  • Pacman Installation (long) Instructions are here.
  • RPM-based Installation Instructions are here.
  • Download & Install Pacman from http://physics.bu.edu/~youssef/pacman/
  • Notes: 
    • Steps must be executed in order!
    • Installation can be done as root or as a regular user.

1. Pacman-based Installation
  • Be sure no standard Ganglia is running, otherwise:
    • skip next step, be sure host sensor (gmond) is installed on your headnode
    • setup GMOND_LOCATION before starting to install policy-mon
  • Install the iVDGL:policy-mon Sensor on each compute (& head) node:
      • pacman -get iVDGL:ivdgl_policy-sens 
    • Answer the questions printed by the setup script (the latest version of the setup script does not ask any question)
      • Cluster Name: unique name to identify your cluster [iVDGL:headnode_domain Cluster]
  • Install the iVDGL:policy-mon Meta-Daemon on the head node:
      • pacman -get iVDGL:ivdgl_policy-mon
    • Before running this command, be sure to:
      • install iVDGL:policy-mon Sensor or have a running gmond on your headnode;
      • setup in your shell if pacman does not already do this in your case:
        • VDT_LOCATION or (GLOBUS_LOCATION and CONDOR_LOCATION) 
        •  PBS_LOCATION 
      • the installation will still succed, but no VO-related information will be published by your site
    • Answer the questions asked by the setup script (the latest version of the setup script does not ask any question):
      • Exported Name: unique name to identify your headnode [iVDGL:headnode_domain Site]
      • Target Hosts: a target host where monitoring data is sent constantly [US_iVDGL 128.135.102.68 8653]
  • If any installation step fails, please try first one more time. If the problem occurs again, please contact Catalin Dumitrescu;

2. Starting
  • iVDGL:policy-mon Sensors :
    • ROOT: by means of startup-script (gmond.init), placed automatically in /etc/rc.d/init.d and linked under rc3.d and rc5.d by the installation process;
    • USER (both sensors and meta-daemon): by means of gmmetad2.crontab, which must be installed under crontab. It checks first for gmond/gmmetad2 previous instances and if they're missing, re-starts the missing components (by re-running to gmond.init and gmmetad2.init)
    • Note: it is important to keep in mind that sensors must run on all worker nodes of your cluster. If the installation is done on a shared filesystem, then you have to:
      • as ROOT: copy manually on all the other nodes the gmond.init file tp /etc/rc.d/init.d and link under rc3.d and rc5.d directories;
      • as USER: install under crontab the gmmetad2.crontab script for eack worker node.
  • iVDGL:policy-mon Meta-Daemon:
    • ROOT: by means of startup-script (gmmetad2.init), placed automatically in /etc/rc.d/init.d and linked under rc3.d and rc5.d by the installation process;
    • USER: by means of gmmetad2.crontab, which must be installed under crontab. It checks first for gmond/gmmetad2 previous instances and if they're missing, re-starts the missing components (by re-running to gmond.init and gmmetad2.init). IT IS DONE FOR SENSOR!
    • Note: the meta-daemon must have only one instance for your cluster. Thus, the installation process should be able to solve all problems for you.
  • Globus, Condor, PBS retrieving information:
    • There are several scripts that interact with the local head/gatekeeper node. Most of them are placed in the sbin/gmmetad2-sensors directory; they access (for example), gatekeeper log file, scheduler log files, Condor pool, PBS queue, the RRD files, information about running processes and so on;
    • iVDGL:policy-mon collects information from them if and only if they are available. If one piece is not installed on your system, associated information is not collected.

3. Checking your installation
  • iVDGL:policy-mon Sensor checking: telnet headnode 8649 -> a lot of XML data should be returned
  • iVDGL:policy-mon Meta-Daemon checking: telnet headnode 8653 -> enter xml_sum_request -> again, a lot of XML data should be returned
  • Go to http://people.cs.uchicago.edu/~cldumitr/Ganglia/index.pl and wait for the monitoring info to reach the top node (~ 30s * 3)


Page created by:  Catalin Lucian Dumitrescu