Glidein Factory monitoring
Monitoring is an essential part of any service. One has to monitor to both maintain the health of the system and to tune the system for anticipated growth. The various ways you can monitor a Glidein Factory are described below.
Log files
Both the Factory Daemon, the Entry Daemons, Condor-G and the Glideins write extensive log files. The logs are kept for a week and then deleted.
Log file locations in glideinWMS v2.3.x and earlier:
The Glidein Factory Daemon log files are located in
<glidein directory>/log/factory_info.<date>.log
<glidein directory>/log/factory_err.<date>.log
Each Entry Daemon has its log files in
<glidein directory>/entry_<entry name>/log/factory_info.<date>.log
<glidein directory>/entry_<entry name>/log/factory_err.<date>.log
For each client an Entry Daemon is serving, one Condor-G job log is used
<glidein directory>/entry_<entry name>/log/condor_activity_<date>_<client-name>.log
Each Glidein also writes a couple of log files, that get transfered back to the factory node after the glidein terminates. The log files are named:
<glidein directory>/entry_<entry name>/log/job.<condor-g job nr>.out
<glidein directory>/entry_<entry name>/log/job.<condor-g job nr>.err
The Glidein .out files are readable using any text editor, while the .err files
contain the compressed logs of the condor daemons.
Use the following commands to extract that information in simple text format
glideinWMS/factory/tools/cat_MasterLog.py <err_fname>
glideinWMS/factory/tools/cat_StartdLog.py <err_fname>
glideinWMS/factory/tools/cat_StarterLog.py <err_fname>
Note: If you need Condor log files from a still running glidein, use the following Condor command
<condor dir>/sbin/condor_fetchlog -pool <pool collector> <glidein slot name> -startd MASTER|STARTD|STARTER
The Entry Daemons also summarize the information about completed glideins into
<glidein directory>/entry_<entry name>/log/completed_jobs_<date>.log
Log file locations in glideinWMS v2.4 and later:
With the introduction of privilage separation in glideinWMS, location for log files have changed, altough a link to log directory is still maintained from the <glidein directory>. Location for the log files is controlled through configuration. glideinWMS uses condor_switchboard to control the access to the log directories. This makes the deployment more secure.
Glidein factory entry Web monitoring
You can either monitor the factory as a whole, or just a single entry point.
The factory monitoring is located at a URL like the one below
http://gfactory1.my.org/glidefactory/monitor/glidein_v1_0/
Moreover, each entry point, has its own history on the Web.
Assuming you have a SanDiego entry, it can be monitored at
http://gfactory1.my.org/glidefactory/monitor/glidein_v1_0/entry_SanDiego/
Historical Web monitoring
The Entry Point Daemons will also create RRD databases and associated graphs for a period of up to one year. This way, one can easily monitor the evolution of the system.
Glidein factory monitoring via WMS tools
You can get the equivalent of the Web page snaphot by using
cd glideinWMS/tools/
python wmsXMLView.py
Glidein factory entry log files
The glidein factory writes two log files per entry point factory_info.YYYYMMDD.log
and factory_err.YYYYMMDD.log.
Assuming you have a SanDiego entry, the log files are in
/home/gfactory/glidein_submit/glidein_v1_0/entry_SanDiego/log
All errors are reported in the factory_err.YYYYMMDD.log. file, while factory_info.YYYYMMDD.log contains entries about what the factory is doing.
Glidein output
Each glidein creates 2 files on exit; job.ID.out and job.ID.err.
Assuming you have a SanDiego entries, the log files are in
/home/gfactory/glidein_submit/glidein_v1_0/entry_SanDiego/log
Problems are usually reasonably easy to spot.
Glidein factory ClassAds in the WMS Collector
The glidein factory also advertises summary information in the WMS collector.
Use condor_status:
condor_status -any
and look for glidefactory and glidefactoryclient ads.
Looking at ClassAds
As explained in the Data exchange overview, the Entry Point Daemons expose a lot of monitoring information in the ClassAds sent to the WMS collector. While this may not be the most user friendly interface, most of the monitoring information you'll ever need is present there.
On top of the Condor provided tools, the factory provides two tools to look at the ClassAds; the first one returns a human readable, but limited text, while the other provides a complete XML-formated output
glideinWMS/tools/wmsTxtView.py [Entries|Sites|Gatekeepers]
glideinWMS/tools/wmsXMLView.py