Go to the U of M home page
School of Physics & Astronomy
School of Physics and Astronomy Wiki
groups:bes3:users:howell:farm:howell-t1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
groups:bes3:users:howell:farm:howell-t1 [2011/05/23 13:22] howellgroups:bes3:users:howell:farm:howell-t1 [2011/05/27 11:57] (current) howell
Line 21: Line 21:
  
 === Log === === Log ===
 +== 2011-05-27 ==
 +  * Problem (appears to be) fixed in commit bb2f314196b3758a6ffb971636e56b9daa8f321c; it was caused by temporary output files being kept separate from the staging directory. After a job is finished, the staging directory gets cleaned up (if it succeeded), but the temporary output folder doesn't. We now only use the staging directory, and commit 668445192aa5659db52200c4773a43311d11ccd4 adds some insurance that it gets cleaned up and no data files get synced to the work directory.
 +  * Also added commit d147245991fe89f8cdfb3e3846d98e5145bec371, which keeps jobs which have completed from re-running. Jobs would never re-run a complete stage in the first place, but in the case of temporary stages (where the output isn't kept), it would try to re-run that in anticipation of non-temporary stages requiring the temporary output. Thus, this saves us a generation step in the event that reconstruction, etc., are all finished.
 == 2011-05-23 == == 2011-05-23 ==
   * even after sed script, farm still broken   * even after sed script, farm still broken
Line 35: Line 38:
   * checked /hdfs/bes3/.../mc, but howell-T1-0-...mn.rtraw wasn't there!   * checked /hdfs/bes3/.../mc, but howell-T1-0-...mn.rtraw wasn't there!
   * check job script to see if the command to move it was broken (remember we changed mv -> cp)   * check job script to see if the command to move it was broken (remember we changed mv -> cp)
 +
   $ cd /dev/shm/staging-howell/howell-T1/howell-T1-0-R11517-E185   $ cd /dev/shm/staging-howell/howell-T1/howell-T1-0-R11517-E185
   $ grep cp execute   $ grep cp execute
Line 42: Line 46:
  
   * no cp for the rtraw file! clearly must regenerate executables for /all/ jobs, as we can't trust any of them now.   * no cp for the rtraw file! clearly must regenerate executables for /all/ jobs, as we can't trust any of them now.
 +
   $ cd /data/bes3d2/bes3/users/howell/simulations/howell-T1/work/jobs   $ cd /data/bes3d2/bes3/users/howell/simulations/howell-T1/work/jobs
   $ rm */execute   $ rm */execute
-  ... 
  
   * Now re-run job 0   * Now re-run job 0
groups/bes3/users/howell/farm/howell-t1.1306174970.txt.gz · Last modified: 2011/05/23 13:22 by howell