Campuses:
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
computing:department:unix:jobs:condor [2012/07/25 18:17] – [Condor: Batch/Grid Computing] allan | computing:department:unix:jobs:condor [2015/12/14 16:43] (current) – [Where do jobs run] allan | ||
---|---|---|---|
Line 2: | Line 2: | ||
Batch-style processing is available on select nodes of the Unix cluster via Condor. | Batch-style processing is available on select nodes of the Unix cluster via Condor. | ||
- | Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and to farm out work to idle desktop computers | + | Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and (in some cases) |
- | Each Physics linux system is tagged with a "cluster" name corresponding to the research group or entity to which it belongs. It is possible to run jobs on any cluster, but if you are unsure whether you should be using a particular group of machines please [[mailto: | + | Each Physics linux system is tagged with a "CondorGroup" name corresponding to the research group or entity to which it belongs. It is possible to run jobs on any cluster, but if you are unsure whether you should be using a particular group of machines please [[mailto: |
+ | < | ||
+ | You can see which servers belong to each cluster, and what their capabilities are, on the [[https:// | ||
- | You can see which machines belong to each cluster, and what their capabilities are, on the [[https:// | + | If your research group has no computational resources of its own, there is a general-purpose '' |
+ | </note> | ||
You can also get an idea of activity within each cluster (although not directly condor-related) at our [[http:// | You can also get an idea of activity within each cluster (although not directly condor-related) at our [[http:// | ||
Line 29: | Line 32: | ||
* [[http:// | * [[http:// | ||
- | ===== Site Specific Information | + | ===== Submitting batch jobs ===== |
- | Our physics " | + | ==== Use vanilla environment ==== |
- | CondorGroup = "cmsfarm" | + | Unless you've specifically used '' |
- | You should read the manual page for '' | + | universe = vanilla |
+ | |||
+ | ==== Limit email output ==== | ||
- | Some example CondorGroups: | + | Notification = error |
- | * phys - Physics machines (general-purpose linux workstations) | + | ==== Request slot resources ==== |
- | * ftpi - FTPI linux workstations (FTPI use only!) | + | |
- | * twins - BES3 compute farm (BES3 use only!) | + | |
- | * minos - MINOS workstations (MINOS use only!) | + | |
- | * cms - CMS servers (CMS use only!) | + | |
- | * astro - Astronomy workstations (Astronomy use only!) | + | |
- | Anyone may submit jobs to "phys" | + | Almost all our condor slots are set up as "partitionable", |
+ | request_cpus = 1 | ||
+ | request_memory = 2048 | ||
- | We currently support | + | ==== think about your data access ==== |
+ | |||
+ | Never use your home directory for job i/o - you should probably be using a ''/ | ||
+ | |||
+ | ===== Where do jobs run ===== | ||
+ | |||
+ | By default, | ||
+ | |||
+ | You can override this behavior in your submit file by manipulating the CondorGroup job ClassAd. For example, you could place the following line in your job file to make the jobs run on the CMS server farm: | ||
+ | |||
+ | +CondorGroup = " | ||
+ | |||
+ | In addition, you can let your jobs run on **any** cluster, with the condition that it will be pre-empted (ie killed) if a job with a higher rank (based on the group owning the cluster) needs to run. You can enable this behavior with: | ||
+ | +CanEvict = True | ||
+ | |||
+ | ===== Why won't my job run ===== | ||
+ | |||
+ | Some commands to help analyze why your jobs isn't matching | ||
+ | |||
+ | condor_q -analyze < | ||
+ | condor_q -better < | ||
+ | |||
+ | Why is job not running on a particular machine | ||
+ | condor_q -better < | ||
+ | condor_q -better: | ||
+ | |||
+ | |