Work Queue is Copyright (C) 2009 The University of Notre Dame. This software is distributed under the GNU General Public License. See the file COPYING for details.
Work Queue is a framework for building master/worker applications. In Work Queue, a Master process is a custom, application-specific program that uses the Work Queue API to define and submit a large number of small tasks. The tasks are executed by many Worker processes, which can run on any available machine. A single Master may direct hundreds to thousands of Workers, allowing users to easily construct highly scalable programs.
Work Queue is a stable framework that has been used to create highly scalable scientific applications in biometrics, bioinformatics, economics, and other fields. It can also be used as an execution engine for the Makeflow workflow engine.
Work Queue is part of the Cooperating Computing Tools. The CCTools package can be downloaded from this web page. Follow the installation instructions to setup CCTools required for running Work Queue. The documentation for the full set of features of the Work Queue API can be viewed from either within the CCTools package or here and here.
We assume that you have downloaded and installed the cctools package in the directory CCTOOLS. Next, download the example file for the language of your choice:
gcc work_queue_example.c -o work_queue_example -I${CCTOOLS}/include/cctools -L${CCTOOLS}/lib -ldttools -lm
export PYTHONPATH=${PYTHONPATH}:${CCTOOLS}/lib/python2.6/site-packages
export PERL5LIB=${PERL5LIB}:${CCTOOLS}/lib/perl5/site_perl
./work_queue_example a b c
listening on port 9123... submitted task: /usr/bin/gzip < a > a.gz submitted task: /usr/bin/gzip < b > b.gz submitted task: /usr/bin/gzip < c > c.gz waiting for tasks to complete...
work_queue_worker MACHINENAME 9123
If you have access to a Condor pool, you can use this shortcut to submit ten workers at once via Condor:
% condor_submit_workers MACHINENAME 9123 10 Submitting job(s).......... Logging submit event(s).......... 10 job(s) submitted to cluster 298.
% sge_submit_workers MACHINENAME 9123 10 Your job 153083 ("worker.sh") has been submitted Your job 153084 ("worker.sh") has been submitted Your job 153085 ("worker.sh") has been submitted ...
When the master completes, if the workers were not shut down in the master, your workers will still be available, so you can either run another master with the same workers, or you can remove the workers with kill, condor_rm, or qdel as appropriate. If you forget to remove them, they will exit automatically after fifteen minutes. (This can be adjusted with the -t option to worker.)
q = work_queue_create(port); for(all tasks) { t = work_queue_task_create(command); /* add to the task description */ work_queue_submit(q,t); } while(!work_queue_empty(q)) { t = work_queue_wait(q); work_queue_task_delete(t); } work_queue_delete(q);
q = work_queue_create(port);
q = WorkQueue(port)
In the example, we specify a command that takes a single input file and produces a single output file. We then create a task by providing the specified command as an argument:
t = work_queue_task_create(command);
t = Task(command)
work_queue_task_specify_file(t,"/usr/bin/gzip","gzip",WORK_QUEUE_INPUT,WORK_QUEUE_CACHE); work_queue_task_specify_file(t,infile,infile,WORK_QUEUE_INPUT,WORK_QUEUE_NOCACHE); work_queue_task_specify_file(t,outfile,outfile,WORK_QUEUE_OUTPUT,WORK_QUEUE_NOCACHE);
t.specify_file("/usr/bin/gzip","gzip",WORK_QUEUE_INPUT,cache=True); t.specify_file(infile,infile,WORK_QUEUE_INPUT,cache=False) t.specify_file(outfile,outfile,WORK_QUEUE_OUTPUT,cache=False)
t = work_queue_task_create("$WORK_QUEUE_SANDBOX/gzip < a > a.gz");
t = Task("$WORK_QUEUE_SANDBOX/gzip < a > a.gz")
We can also run a program that is already installed at the remote site, where the worker runs, by specifying its installed location in the command line of the task (and removing the specification of the executable as an input file). For example:
t = work_queue_task_create("/usr/bin/gzip < a > a.gz");
t = Task("/usr/bin/gzip < a > a.gz")
taskid = work_queue_submit(q,t);
taskid = q.submit(t)
t = work_queue_wait(q,5);
t = q.wait(5)
work_queue_task_delete(t);
Deleted automatically when task object goes out of scope
work_queue_delete(q);
Deleted automatically when work_queue object goes out of scope
The project name feature uses the catalog server to maintain and track the project names of masters and their respective locations. It works as follows: the master advertises its project name along with its hostname and port to the catalog server. Work Queue workers that are provided with the master's project name query the catalog server to find the hostname and port of the master with the given project name. So, to utilize this feature, the master must be specified to run in the WORK_QUEUE_MASTER_MODE_CATALOG.
For example, to have a Work Queue master advertise its project name as myproject, add the following code snippet after creating the queue:
work_queue_specify_master_mode(q, WORK_QUEUE_MASTER_MODE_CATALOG) work_queue_specify_name(q, "myproject");
wq.specify_mode(WORK_QUEUE_MASTER_MODE_CATALOG) wq.specify_name("myproject")
work_queue_worker -N myproject
% condor_submit_workers -N myproject 10 Submitting job(s).......... Logging submit event(s).......... 10 job(s) submitted to cluster 298.
% sge_submit_workers -N myproject 10 Your job 153097 ("worker.sh") has been submitted Your job 153098 ("worker.sh") has been submitted Your job 153099 ("worker.sh") has been submitted ...
We recommend that you enable a password for your applications. Create a file (e.g. mypwfile) that contains any password (or other long phrase) that you like (e.g. This is my password). The password will be particular to your application and should not match any other passwords that you own. Note that the contents of the file are taken verbatim as the password; this means that any new line character at the end of the phrase will be considered as part of the password.
Then, modify your master program to use the password:
work_queue_specify_password_file(q,mypwfile);
q.specify_password_file(mypwfile)
And give the --password option to give the same password file to your workers:
work_queue_worker --password mypwfile MACHINENAME 9123
With this option enabled, both the master and the workers will verify that the other has the matching password before proceeding. The password is not sent in the clear, but is securely verified through a SHA1-based challenge-response protocol.
A Work Queue foreman allows Work Queue workers to be managed in an hierarchical manner. Each foreman connects to the Work Queue master and accepts tasks as though it were a worker. It then accepts connections from Work Queue workers and dispatches tasks to them as if it were the master.
A setup using foremen is beneficial when there are common files that need to be transmitted to workers and cached for subsequent executions. In this case, the foremen transfer the common files to their workers without requiring any intervention from the master, thereby lowering the communication and transfer overheads at the master.
Foremen are also useful when harnessing resources from multiple clusters. A foreman can be run on the head node of a cluster acting as a single communications hub for the workers in that cluster. This reduces the network connections leaving the cluster and minimizes the transfer costs for sending data into the cluster over wide area networks.
To start a Work Queue foreman, invoke work_queue_worker with the --foreman argument. The foreman can advertise a project name using the -N option to enable workers to find and connect to it without being given its hostname and port. On the other end, the foreman will connect to the master with the same project name specified in -M argument (alternatively, the hostname and port of the master can be provided instead of its project name).
For example, to run a foreman that works for a master with project name myproject and advertises itself as foreman_myproject:
% work_queue_worker --foreman -M myproject -N foreman_myproject
To run a worker that connects to a foreman, specify the foreman's project name in the -N option. For example:
% work_queue_worker -N foreman_myproject
The Work Queue workers, by default, advertise a single slot spanning a single core. As a result, the workers only execute a single core task by default.
The multi-slot feature enables workers to span across all available resources (cores, memory, disk) and simultaneously execute multiple tasks that can be accomodated within the available resource sizes.
To start a multi-slot worker, you can specify the worker to automatically report the number of cores present at its execution site as:
% work_queue_worker --cores 0 MACHINENAME 9123
Note that the worker always reports the available memory and disk space observed at its execution site.
You can also manually specify the cores, memory, and disk that a worker should report back to the master as being available using the cores, memory, and disk command-line arguments. For example, to have a worker report 2 cores, 1 GB of memory, and 8 GB of disk as being available for task executions, do:
% work_queue_worker --cores 2 --memory 1000 --disk 8000 MACHINENAME 9123
To take advantage of the multi-slot workers, the tasks submitted to the Work Queue need to be annotated with their resource requirements in terms of cores, memory, and disk.
work_queue_task_specify_cores(t, 2); //needs 2 cores work_queue_task_specify_memory(t, 100); //needs 100 MB memory work_queue_task_specify_disk(t, 1000); //needs 1 GB disk
t.specify_cores(2) #needs 2 cores t.specify_memory(100) #needs 100 MB memory t.specify_disk(1000) #needs 1 GB disk
Note that if no requirements are specified, a task consumes an entire worker. If one or more requirements are specified, a task is assumed to consume those requirements and the unlabeled resource requirements are assumed to be negligible. For example, if you annotate a task as using 1 core, but don't specify its memory or disk requirements, then Work Queue will schedule two such tasks to a two-slot worker, regardless of their memory or disk usage.
work_queue_task_specify_file(t,"a.$OS.$ARCH","a",WORK_QUEUE_INPUT,WORK_QUEUE_CACHE);
t.specify_file("a.$OS.$ARCH","a",WORK_QUEUE_INPUT,cache=True)
Note this feature is specifically designed for specifying and distingushing input file names for different platforms and architectures. Also, this is different from the $WORK_QUEUE_SANDBOX shell environment variable that exports the location of the working directory of the worker to its execution environment.
t = work_queue_cancel_by_tasktag(q,"task3");
t = q.cancel_by_tasktag("task3")