Go to the U of M home page
School of Physics & Astronomy
School of Physics and Astronomy Wiki

User Tools


computing:department:unix:jobs:condor

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
computing:department:unix:jobs:condor [2015/09/02 18:14] allancomputing:department:unix:jobs:condor [2015/12/14 16:43] (current) – [Where do jobs run] allan
Line 2: Line 2:
 Batch-style processing is available on select nodes of the Unix cluster via Condor. Batch-style processing is available on select nodes of the Unix cluster via Condor.
  
-Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and to farm out work to idle desktop computers+Condor is a software framework for distributed parallel computation. At Minnesota, we use it both to manage workload on dedicated server clusters, and (in some cases) to farm out work to idle desktop computers
  
-Each Physics linux system is tagged with a "cluster" name corresponding to the research group or entity to which it belongs. It is possible to run jobs on any cluster, but if you are unsure whether you should be using a particular group of machines please [[mailto:net@physics.umn.edu?subject=Condor Question|check first]].+Each Physics linux system is tagged with a "CondorGroup" name corresponding to the research group or entity to which it belongs. It is possible to run jobs on any cluster, but if you are unsure whether you should be using a particular group of machines please [[mailto:net@physics.umn.edu?subject=Condor Question|check first]].
 <note> <note>
 +You can see which servers belong to each cluster, and what their capabilities are, on the [[https://www.physics.umn.edu/resources/myphys/computing/systems.html|Linux Servers Listing]] page in MyPhys.
 +
 If your research group has no computational resources of its own, there is a general-purpose ''phys'' cluster which can be used by anyone. If your research group has no computational resources of its own, there is a general-purpose ''phys'' cluster which can be used by anyone.
 </note> </note>
-You can see which servers belong to each cluster, and what their capabilities are, on the [[https://www.physics.umn.edu/resources/myphys/network/depthosts.html?show=linuxsrv|Linux Servers Listing]] page in MyPhys. 
  
 You can also get an idea of activity within each cluster (although not directly condor-related) at our [[http://monitor.physics.umn.edu/ganglia/|Ganglia monitoring page]] You can also get an idea of activity within each cluster (although not directly condor-related) at our [[http://monitor.physics.umn.edu/ganglia/|Ganglia monitoring page]]
Line 31: Line 32:
     * [[http://research.cs.wisc.edu/condor/tutorials/fermi-2005/|Quick Condor Tutorial for Fermilab]]     * [[http://research.cs.wisc.edu/condor/tutorials/fermi-2005/|Quick Condor Tutorial for Fermilab]]
  
-===== slot resources =====+===== Submitting batch jobs =====
  
-Almost all our condor slots are set up as partitionable, which means you can request specific resource allocations:+==== Use vanilla environment ====
  
-  * ''request_cpus'' - Defaults to 1  +Unless you've specifically used ''condor_compile'' to build your programs, you'll need to submit your jobs in the "vanilla" universe.
-  * ''request_memory'' - Defined in megabytes; defaults to the ImageSize or JobVMemory parameters.  +
-  * ''request_disk'' - Defined in kilobytes; defaults to the DiskUsage parameter+
  
-===== Site Specific Information =====+  universe vanilla 
 +   
 +==== Limit email output ====
  
-Our physics "cluster" names map to a ''CondorGroup'' job ClassAd, which allows you to submit jobs to different groups of machines based on where you submit them from. It is possible to override this behavior in your submit file by manipulating the CondorGroup job ClassAd. For example, you could place the following line in your job file to make the jobs run on the CMS server farm:+  Notification = error
  
-  +CondorGroup "cmsfarm"+==== Request slot resources ====
  
-You should read the manual page for ''condor_submit'' for more details on what can be put in the job file - although please note that ''CondorGroup'' is a locally-invented ClassAdand you won't find any further documentation for it).+Almost all our condor slots are set up as "partitionable", which means you can request what resources are needed. Try to specify appropriate values in your submit file so that condor can reserve resources for your job (without this, your job will be given default values which may cause it to be held). For example: 
 +  request_cpus = 1 
 +  request_memory = 2048 
 + 
 +==== think about your data access ==== 
 + 
 +Never use your home directory for job i/o - you should probably be using a ''/data'' volume. 
 + 
 +===== Where do jobs run ===== 
 + 
 +By default, the cluster which executes a job is determined by the machine where you issue the ''condor_submit'' command for exampleif you submit a job from a CMS system, it will execute on the CMS cluster. 
 + 
 +You can override this behavior in your submit file by manipulating the CondorGroup job ClassAd. For example, you could place the following line in your job file to make the jobs run on the CMS server farm: 
 + 
 +  +CondorGroup = "cmsfarm"
  
-Some example CondorGroups:+In addition, you can let your jobs run on **any** cluster, with the condition that it will be pre-empted (ie killed) if a job with a higher rank (based on the group owning the cluster) needs to run. You can enable this behavior with: 
 +  +CanEvict = True
  
-  * phys     - Physics machines (general-purpose linux workstations) +===== Why won't my job run =====
-  * ftpi     - FTPI linux workstations (FTPI use only!) +
-  * bes3farm - BES3 compute farm (BES3 use only!) +
-  * neutrino - Neutrino workstations, e.g. MINOS/NOvA (Neutrino use only!) +
-  * novafarm - NOvA servers (NOvA use only!) +
-  * cmsfarm  - CMS servers (CMS use only!) +
-  * astro    - Astronomy workstations (Astronomy use only!)+
  
-Anyone may submit jobs to "phys" machines, so submitting jobs from a public physics machine (or from ''ssh.physics.umn.edu'') are safe bets.+Some commands to help analyze why your jobs isn't matching
  
-We currently support the vanilla, standard, and MPI universes. +  condor_q -analyze <jobid> 
 +  condor_q -better <jobid> 
 +   
 +Why is job not running on a particular machine 
 +  condor_q -better <jobid> -machine <name> 
 +  condor_q -better:reverse -machine <name> 
 +   
 +  
computing/department/unix/jobs/condor.1441235649.txt.gz · Last modified: 2015/09/02 18:14 by allan