Condor: Batch/Grid Computing

Batch-style processing is available on select nodes of the Unix cluster via Condor.

Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and (in some cases) to farm out work to idle desktop computers

Each Physics linux system is tagged with a “CondorGroup” name corresponding to the research group or entity to which it belongs. It is possible to run jobs on any cluster, but if you are unsure whether you should be using a particular group of machines please check first. <note> You can see which servers belong to each cluster, and what their capabilities are, on the Linux Servers Listing page in MyPhys.

If your research group has no computational resources of its own, there is a general-purpose phys cluster which can be used by anyone. </note>

You can also get an idea of activity within each cluster (although not directly condor-related) at our Ganglia monitoring page

Lightning Condor Summary

Useful commands for submitting jobs are

condor_run - for quick & easy job submission
condor_submit - for full control

Useful commands for job and machine status are

condor_status shows active machines and queues
condor_status -submitters shows job submitter summary
condor_q shows jobs in the local job queue
condor_q -global shows jobs in the global job queue

You can find some more information at:

Submitting batch jobs

Use vanilla environment

Unless you've specifically used condor_compile to build your programs, you'll need to submit your jobs in the “vanilla” universe.

universe = vanilla

Limit email output

Notification = error

Request slot resources

Almost all our condor slots are set up as “partitionable”, which means you can request what resources are needed. Try to specify appropriate values in your submit file so that condor can reserve resources for your job (without this, your job will be given default values which may cause it to be held). For example:

request_cpus = 1
request_memory = 2048

think about your data access

Never use your home directory for job i/o - you should probably be using a /data volume.

Where do jobs run

By default, the cluster which executes a job is determined by the machine where you issue the condor_submit command - for example, if you submit a job from a CMS system, it will execute on the CMS cluster.

You can override this behavior in your submit file by manipulating the CondorGroup job ClassAd. For example, you could place the following line in your job file to make the jobs run on the CMS server farm:

+CondorGroup = "cmsfarm"

In addition, you can let your jobs run on any cluster, with the condition that it will be pre-empted (ie killed) if a job with a higher rank (based on the group owning the cluster) needs to run. You can enable this behavior with:

+CanEvict = True

Why won't my job run

Some commands to help analyze why your jobs isn't matching

condor_q -analyze <jobid>
condor_q -better <jobid>

Why is job not running on a particular machine

condor_q -better <jobid> -machine <name>
condor_q -better:reverse -machine <name>

User Tools

Table of Contents