GlideinWMS The Glidein-based Workflow Management System

Search Results

WMS Factory

Monitoring

Glidein Factory monitoring

Monitoring is an essential part of any service. One has to monitor to both maintain the health of the system and to tune the system for anticipated growth. The various ways you can monitor a Glidein Factory are described below.

Log files

Both the Factory Daemon, the Entry Daemons, Condor-G and the Glideins write extensive log files. The logs are kept for a week and then deleted.

Log file locations in glideinWMS v2.3.x and earlier:

The Glidein Factory Daemon log files are located in

<glidein directory>/log/factory_info.<date>.log
<glidein directory>/log/factory_err.<date>.log

Each Entry Daemon has its log files in

<glidein directory>/entry_<entry name>/log/factory_info.<date>.log
<glidein directory>/entry_<entry name>/log/factory_err.<date>.log

For each client an Entry Daemon is serving, one Condor-G job log is used

<glidein directory>/entry_<entry name>/log/condor_activity_<date>_<client-name>.log

Each Glidein also writes a couple of log files, that get transfered back to the factory node after the glidein terminates. The log files are named:

<glidein directory>/entry_<entry name>/log/job.<condor-g job nr>.out
<glidein directory>/entry_<entry name>/log/job.<condor-g job nr>.err

The Glidein .out files are readable using any text editor, while the .err files contain the compressed logs of the condor daemons.
Use the following commands to extract that information in simple text format

glideinWMS/factory/tools/cat_MasterLog.py <err_fname>
glideinWMS/factory/tools/cat_StartdLog.py <err_fname>
glideinWMS/factory/tools/cat_StarterLog.py <err_fname>

Note: If you need Condor log files from a still running glidein, use the following Condor command

<condor dir>/sbin/condor_fetchlog -pool <pool collector> <glidein slot name> -startd MASTER|STARTD|STARTER

The Entry Daemons also summarize the information about completed glideins into

<glidein directory>/entry_<entry name>/log/completed_jobs_<date>.log

Log file locations in glideinWMS v2.4 and later:

With the introduction of privilage separation in glideinWMS, location for log files have changed, altough a link to log directory is still maintained from the <glidein directory>. Location for the log files is controlled through configuration. glideinWMS uses condor_switchboard to control the access to the log directories. This makes the deployment more secure.

Glidein factory entry Web monitoring

You can either monitor the factory as a whole, or just a single entry point.

The factory monitoring is located at a URL like the one below

http://gfactory1.my.org/glidefactory/monitor/glidein_v1_0/

Moreover, each entry point, has its own history on the Web.

Assuming you have a SanDiego entry, it can be monitored at

http://gfactory1.my.org/glidefactory/monitor/glidein_v1_0/entry_SanDiego/

Historical Web monitoring

The Entry Point Daemons will also create RRD databases and associated graphs for a period of up to one year. This way, one can easily monitor the evolution of the system.

Glidein factory monitoring via WMS tools

You can get the equivalent of the Web page snaphot by using

cd glideinWMS/tools/
python wmsXMLView.py

Glidein factory entry log files

The glidein factory writes two log files per entry point factory_info.YYYYMMDD.log and factory_err.YYYYMMDD.log.

Assuming you have a SanDiego entry, the log files are in

/home/gfactory/glidein_submit/glidein_v1_0/entry_SanDiego/log

All errors are reported in the factory_err.YYYYMMDD.log. file, while factory_info.YYYYMMDD.log contains entries about what the factory is doing.

Glidein output

Each glidein creates 2 files on exit; job.ID.out and job.ID.err.

Assuming you have a SanDiego entries, the log files are in

/home/gfactory/glidein_submit/glidein_v1_0/entry_SanDiego/log

Problems are usually reasonably easy to spot.

Glidein factory ClassAds in the WMS Collector

The glidein factory also advertises summary information in the WMS collector.

Use condor_status:

condor_status -any

and look for glidefactory and glidefactoryclient ads.

Looking at ClassAds

As explained in the Data exchange overview, the Entry Point Daemons expose a lot of monitoring information in the ClassAds sent to the WMS collector. While this may not be the most user friendly interface, most of the monitoring information you'll ever need is present there.

On top of the Condor provided tools, the factory provides two tools to look at the ClassAds; the first one returns a human readable, but limited text, while the other provides a complete XML-formated output

glideinWMS/tools/wmsTxtView.py [Entries|Sites|Gatekeepers]
glideinWMS/tools/wmsXMLView.py