Campuses:
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
computing:department:unix:jobs:condor [2010/05/03 15:24] – allan | computing:department:unix:jobs:condor [2015/12/14 16:43] (current) – [Where do jobs run] allan | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Condor: Batch/Grid Computing ====== | ====== Condor: Batch/Grid Computing ====== | ||
- | Batch-style processing is available on select nodes of the Unix cluster via [[http:// | + | Batch-style processing is available on select nodes of the Unix cluster via Condor. |
+ | Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and (in some cases) to farm out work to idle desktop computers | ||
+ | Each Physics linux system is tagged with a " | ||
+ | < | ||
+ | You can see which servers belong to each cluster, and what their capabilities are, on the [[https:// | ||
- | ===== Generic | + | If your research group has no computational resources of its own, there is a general-purpose '' |
+ | </ | ||
+ | |||
+ | You can also get an idea of activity within each cluster (although not directly condor-related) at our [[http:// | ||
+ | ===== Lightning | ||
Useful commands for submitting jobs are | Useful commands for submitting jobs are | ||
Line 18: | Line 26: | ||
* '' | * '' | ||
- | For a complete introduction to Condor, please read the Condor Manual: | + | You can find some more information at: |
+ | |||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | |||
+ | ===== Submitting batch jobs ===== | ||
+ | |||
+ | ==== Use vanilla environment ==== | ||
+ | |||
+ | Unless you've specifically used '' | ||
+ | |||
+ | universe = vanilla | ||
+ | |||
+ | ==== Limit email output ==== | ||
+ | |||
+ | Notification = error | ||
- | * [[http:// | + | ==== Request slot resources ==== |
+ | Almost all our condor slots are set up as " | ||
+ | request_cpus = 1 | ||
+ | request_memory = 2048 | ||
+ | ==== think about your data access ==== | ||
- | ===== Site Specific Information ===== | + | Never use your home directory for job i/o - you should probably be using a ''/ |
+ | ===== Where do jobs run ===== | ||
- | Nodes are tagged with a '' | + | By default, the cluster which executes |
- | | + | You can override this behavior in your submit file by manipulating the CondorGroup |
- | You should read the manual page for '' | + | +CondorGroup |
- | Some example CondorGroups: | + | In addition, you can let your jobs run on **any** cluster, with the condition that it will be pre-empted (ie killed) if a job with a higher rank (based on the group owning the cluster) needs to run. You can enable this behavior with: |
+ | +CanEvict = True | ||
- | * physlin - Physlin machines (general-purpose linux workstations) | + | ===== Why won't my job run ===== |
- | * tpilin | + | |
- | * twins - BES3 compute farm (BES3 use only!) | + | |
- | * minos - MINOS workstations (MINOS use only!) | + | |
- | * cms - CMS servers (CMS use only!) | + | |
- | * astro - Astronomy workstations (Astronomy use only!) | + | |
- | It is possible | + | Some commands |
- | We currently support the vanilla, standard, and MPI universes. | + | condor_q -analyze < |
+ | condor_q -better < | ||
+ | |||
+ | Why is job not running on a particular machine | ||
+ | condor_q -better < | ||
+ | condor_q -better: | ||
+ | |||
+ | |