Campuses:
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
computing:department:unix:jobs:condor [2009/02/23 14:23] – allan | computing:department:unix:jobs:condor [2015/12/14 16:43] (current) – [Where do jobs run] allan | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Condor: Batch/Grid Computing ====== | ====== Condor: Batch/Grid Computing ====== | ||
- | Batch-style processing is available on select nodes of the Unix cluster via [[http:// | + | Batch-style processing is available on select nodes of the Unix cluster via Condor. |
+ | Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and (in some cases) to farm out work to idle desktop computers | ||
+ | Each Physics linux system is tagged with a " | ||
+ | < | ||
+ | You can see which servers belong to each cluster, and what their capabilities are, on the [[https:// | ||
- | ===== Generic | + | If your research group has no computational resources of its own, there is a general-purpose '' |
+ | </ | ||
+ | |||
+ | You can also get an idea of activity within each cluster (although not directly condor-related) at our [[http:// | ||
+ | ===== Lightning | ||
Useful commands for submitting jobs are | Useful commands for submitting jobs are | ||
Line 18: | Line 26: | ||
* '' | * '' | ||
- | For a complete introduction to Condor, | + | You can find some more information at: |
+ | |||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | |||
+ | ===== Submitting batch jobs ===== | ||
+ | |||
+ | ==== Use vanilla environment ==== | ||
+ | |||
+ | Unless you've specifically used '' | ||
+ | |||
+ | universe = vanilla | ||
+ | |||
+ | ==== Limit email output ==== | ||
+ | |||
+ | Notification = error | ||
+ | |||
+ | ==== Request slot resources ==== | ||
+ | |||
+ | Almost all our condor slots are set up as " | ||
+ | request_cpus = 1 | ||
+ | request_memory = 2048 | ||
- | * [[http:// | + | ==== think about your data access ==== |
+ | Never use your home directory for job i/o - you should probably be using a ''/ | ||
+ | ===== Where do jobs run ===== | ||
- | ===== Site Specific Information ===== | + | By default, the cluster which executes a job is determined by the machine where you issue the '' |
+ | You can override this behavior in your submit file by manipulating the CondorGroup job ClassAd. For example, you could place the following line in your job file to make the jobs run on the CMS server farm: | ||
- | Nodes are tagged with a CondorGroup | + | +CondorGroup |
- | Available CondorGroups: | + | In addition, you can let your jobs run on **any** cluster, with the condition that it will be pre-empted (ie killed) if a job with a higher rank (based on the group owning the cluster) needs to run. You can enable this behavior with: |
+ | +CanEvict = True | ||
- | * osf - Tru64 Alpha cluster | + | ===== Why won't my job run ===== |
- | * physlin - Physlin machines (general-purpose linux workstations) | + | |
- | * tpilin | + | |
- | * twins - CLEO compute farm (CLEO use only!) | + | |
- | * minos - MINOS workstations (MINOS use only!) | + | |
- | * cms - CMS servers (CMS use only!) | + | |
- | * astro - Astronomy workstations (Astronomy use only!) | + | |
- | It is possible | + | Some commands |
- | We currently support the vanilla, standard, and MPI universes. | + | condor_q -analyze < |
+ | condor_q -better < | ||
+ | |||
+ | Why is job not running on a particular machine | ||
+ | condor_q -better < | ||
+ | condor_q -better: | ||
+ | |||
+ | |