iVDGL:policy-mon Module
-- Installation using Pacman --
0. Before starting
- Installation Instructions are here.
- Pacman Installation (long) Instructions are here.
- RPM-based Installation Instructions are here.
- Download & Install Pacman from http://physics.bu.edu/~youssef/pacman/
- Notes:
- Steps must be executed in order!
- Installation can be done as root or
as a regular user.
1. Pacman-based Installation
- Be sure no standard Ganglia is running, otherwise:
- skip next step, be sure host sensor (gmond) is installed on your
headnode
- setup GMOND_LOCATION before starting to install policy-mon
- Install the iVDGL:policy-mon
Sensor on each compute (& head) node:
- pacman -get iVDGL:ivdgl_policy-sens
- Answer the questions printed by the setup script (the latest
version of the setup script does not ask any question):
- Cluster Name: unique name to identify your cluster
[iVDGL:headnode_domain Cluster]
- Install the iVDGL:policy-mon
Meta-Daemon on the head node:
- pacman -get iVDGL:ivdgl_policy-mon
- Before running this command, be sure to:
- install iVDGL:policy-mon Sensor or have a running gmond
on your headnode;
- setup in your shell if pacman does not
already do this in your case:
- VDT_LOCATION or (GLOBUS_LOCATION and CONDOR_LOCATION)
- PBS_LOCATION
- the installation will still succed, but no VO-related information
will be published by your site
- Answer the questions asked by the setup script (the
latest version of the setup script does not ask any question):
- Exported Name: unique name to identify your headnode
[iVDGL:headnode_domain Site]
- Target Hosts: a target host where monitoring data
is sent constantly [US_iVDGL 128.135.102.68 8653]
- If any installation step fails, please try first one more time.
If the problem occurs again, please contact Catalin Dumitrescu;
2. Starting
- iVDGL:policy-mon Sensors :
- ROOT: by means of startup-script (gmond.init),
placed automatically in /etc/rc.d/init.d
and linked under rc3.d and rc5.d by the installation process;
- USER (both sensors and meta-daemon): by
means of gmmetad2.crontab, which
must be installed under crontab. It checks first for gmond/gmmetad2 previous
instances and if they're missing, re-starts the missing components (by re-running
to gmond.init and gmmetad2.init)
- Note: it is important to keep in mind that sensors must run on
all worker nodes of your cluster. If the installation is done on a shared
filesystem, then you have to:
- as ROOT: copy manually on all the other nodes
the gmond.init file tp /etc/rc.d/init.d and link under rc3.d and rc5.d directories;
- as USER: install under crontab the gmmetad2.crontab
script for eack worker node.
- iVDGL:policy-mon Meta-Daemon:
- ROOT: by means of startup-script (gmmetad2.init), placed automatically in
/etc/rc.d/init.d and linked
under rc3.d and
rc5.d by the installation process;
- USER: by means of gmmetad2.crontab,
which must be installed under crontab. It checks first for gmond/gmmetad2
previous instances and if they're missing, re-starts the missing components
(by re-running to gmond.init
and gmmetad2.init). IT IS
DONE FOR SENSOR!
- Note: the meta-daemon must have only one instance for your cluster.
Thus, the installation process should be able to solve all problems for you.
- Globus, Condor, PBS retrieving information:
- There are several scripts that interact with the local head/gatekeeper
node. Most of them are placed in the sbin/gmmetad2-sensors
directory; they access (for example), gatekeeper log file, scheduler log
files, Condor pool, PBS queue, the RRD files, information about running
processes and so on;
- iVDGL:policy-mon collects information from them if and only if
they are available. If one piece is not installed on your system, associated
information is not collected.
3. Checking your installation
- iVDGL:policy-mon Sensor checking: telnet headnode 8649 -> a lot
of XML data should be returned
- iVDGL:policy-mon Meta-Daemon checking: telnet headnode 8653 ->
enter xml_sum_request -> again, a lot of XML data should be
returned
- Go to http://people.cs.uchicago.edu/~cldumitr/Ganglia/index.pl
and wait for the monitoring info to reach the top node (~ 30s * 3)
Page created by:
Catalin Lucian Dumitrescu